gpl.pierrick.brihaye.aramorph.lucene
Class ArabicStemAnalyzer
java.lang.Object
org.apache.lucene.analysis.Analyzer
gpl.pierrick.brihaye.aramorph.lucene.ArabicStemAnalyzer
- public final class ArabicStemAnalyzer
- extends org.apache.lucene.analysis.Analyzer
Analyzer for the arabic language. This analyzer uses Tim Buckwalter's algorithm
(available at LDC
Catalog) to identify the morphological category of arabic tokens.
The significant grammatical categories are still to be determined but the current list gives
good results.
Final tokens are a romanized version of the canonical word.
- Author:
- Pierrick Brihaye, 2003
Field Summary |
protected boolean |
outputBuckwalter
Whether or not the analyzer should output tokens in the Buckwalter transliteration system |
Constructor Summary |
ArabicStemAnalyzer()
Constructs an analyzer that will return grammatically significant arabic tokens in the Buckwalter transliteration system. |
ArabicStemAnalyzer(boolean outputBuckwalter)
Constructs an analyzer that will return grammatically significant arabic tokens. |
Method Summary |
org.apache.lucene.analysis.TokenStream |
tokenStream(java.lang.String FieldName,
java.io.Reader reader)
Returns a token stream of arabic words whose grammatically categories are found to be significant. |
Methods inherited from class org.apache.lucene.analysis.Analyzer |
tokenStream |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
outputBuckwalter
protected boolean outputBuckwalter
- Whether or not the analyzer should output tokens in the Buckwalter transliteration system
ArabicStemAnalyzer
public ArabicStemAnalyzer()
- Constructs an analyzer that will return grammatically significant arabic tokens in the Buckwalter transliteration system.
ArabicStemAnalyzer
public ArabicStemAnalyzer(boolean outputBuckwalter)
- Constructs an analyzer that will return grammatically significant arabic tokens.
- Parameters:
outputBuckwalter
- Whether or not the tokens should be translitered
tokenStream
public org.apache.lucene.analysis.TokenStream tokenStream(java.lang.String FieldName,
java.io.Reader reader)
- Returns a token stream of arabic words whose grammatically categories are found to be significant.
- Parameters:
reader
- The reader
- Returns:
- The token stream