gpl.pierrick.brihaye.aramorph.lucene
Class ArabicGlossAnalyzer
java.lang.Object
org.apache.lucene.analysis.Analyzer
gpl.pierrick.brihaye.aramorph.lucene.ArabicGlossAnalyzer
- public final class ArabicGlossAnalyzer
- extends org.apache.lucene.analysis.Analyzer
An english glosser for the arabic language. This glosser uses Tim Buckwalter's algorithm
(available at LDC
Catalog) to identify the grammatical category of arabic tokens and then return their glosses.
The significant grammatical categories are still to be determined but the current list gives
good results.
- Author:
- Pierrick Brihaye, 2003
Field Summary |
static java.lang.String[] |
STOP_WORDS
An array containing some common english words that are usually not
useful for searching. |
Method Summary |
org.apache.lucene.analysis.TokenStream |
tokenStream(java.lang.String fieldName,
java.io.Reader reader)
Returns a token stream of glosses of arabic words whose grammatical categories are found to be significant. |
Methods inherited from class org.apache.lucene.analysis.Analyzer |
tokenStream |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
STOP_WORDS
public static final java.lang.String[] STOP_WORDS
- An array containing some common english words that are usually not
useful for searching.
ArabicGlossAnalyzer
public ArabicGlossAnalyzer()
tokenStream
public org.apache.lucene.analysis.TokenStream tokenStream(java.lang.String fieldName,
java.io.Reader reader)
- Returns a token stream of glosses of arabic words whose grammatical categories are found to be significant.
- Parameters:
reader
- The reader
- Returns:
- The token stream