gpl.pierrick.brihaye.aramorph.lucene
Class ArabicGlossAnalyzer

java.lang.Object
  extended byorg.apache.lucene.analysis.Analyzer
      extended bygpl.pierrick.brihaye.aramorph.lucene.ArabicGlossAnalyzer

public final class ArabicGlossAnalyzer
extends org.apache.lucene.analysis.Analyzer

An english glosser for the arabic language. This glosser uses Tim Buckwalter's algorithm (available at LDC Catalog) to identify the grammatical category of arabic tokens and then return their glosses. The significant grammatical categories are still to be determined but the current list gives good results.

Author:
Pierrick Brihaye, 2003

Field Summary
static java.lang.String[] STOP_WORDS
          An array containing some common english words that are usually not useful for searching.
 
Constructor Summary
ArabicGlossAnalyzer()
           
 
Method Summary
 org.apache.lucene.analysis.TokenStream tokenStream(java.lang.String fieldName, java.io.Reader reader)
          Returns a token stream of glosses of arabic words whose grammatical categories are found to be significant.
 
Methods inherited from class org.apache.lucene.analysis.Analyzer
tokenStream
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

STOP_WORDS

public static final java.lang.String[] STOP_WORDS
An array containing some common english words that are usually not useful for searching.

Constructor Detail

ArabicGlossAnalyzer

public ArabicGlossAnalyzer()
Method Detail

tokenStream

public org.apache.lucene.analysis.TokenStream tokenStream(java.lang.String fieldName,
                                                          java.io.Reader reader)
Returns a token stream of glosses of arabic words whose grammatical categories are found to be significant.

Parameters:
reader - The reader
Returns:
The token stream