fr.gouv.culture.sdx.search.lucene.analysis
Class Analyzer_ar

java.lang.Object
  extended by org.apache.lucene.analysis.Analyzer
      extended by fr.gouv.culture.sdx.search.lucene.analysis.AbstractAnalyzer
          extended by fr.gouv.culture.sdx.search.lucene.analysis.Analyzer_ar
All Implemented Interfaces:
Analyzer, java.io.Serializable, org.apache.avalon.framework.configuration.Configurable, org.apache.avalon.framework.logger.LogEnabled, org.apache.excalibur.xml.sax.XMLizable

public final class Analyzer_ar
extends AbstractAnalyzer

Analyzer for the arabic language. This analyzer uses Tim Buckwalter's algorithm (avalaible at LDC Catalog) to identify the morphological category of arabic tokens. The relevant categories are still to be determined but the current list gives good results. Final tokens are a romanized canonical version of the word.

Author:
Pierrick Brihaye, 2003
See Also:
Serialized Form

Field Summary
protected static java.lang.String ANALYZER_TYPE
           
 
Fields inherited from class fr.gouv.culture.sdx.search.lucene.analysis.AbstractAnalyzer
logger
 
Constructor Summary
Analyzer_ar()
           
 
Method Summary
 void configure(org.apache.avalon.framework.configuration.Configuration configuration)
          Configure the glosser.
 void enableLogging(org.apache.avalon.framework.logger.Logger logger)
          Transmits a super.getLog() to the class.
protected  java.lang.String getAnalyzerType()
           
 org.apache.lucene.analysis.TokenStream tokenStream(java.io.Reader reader)
          Deprecated. use tokenStream(String, Reader) instead.
 org.apache.lucene.analysis.TokenStream tokenStream(java.lang.String fieldName, java.io.Reader reader)
          Returns a token stream of romanized arabic words whose morphological categories are found to be semantically relevant.
 
Methods inherited from class fr.gouv.culture.sdx.search.lucene.analysis.AbstractAnalyzer
toSAX
 
Methods inherited from class org.apache.lucene.analysis.Analyzer
getPositionIncrementGap
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

ANALYZER_TYPE

protected static final java.lang.String ANALYZER_TYPE
See Also:
Constant Field Values
Constructor Detail

Analyzer_ar

public Analyzer_ar()
Method Detail

configure

public void configure(org.apache.avalon.framework.configuration.Configuration configuration)
               throws org.apache.avalon.framework.configuration.ConfigurationException
Configure the glosser.

Specified by:
configure in interface org.apache.avalon.framework.configuration.Configurable
Overrides:
configure in class AbstractAnalyzer
Parameters:
configuration - The configuration object
Throws:
org.apache.avalon.framework.configuration.ConfigurationException - If a problem occurs during configuration

enableLogging

public void enableLogging(org.apache.avalon.framework.logger.Logger logger)
Transmits a super.getLog() to the class.

Specified by:
enableLogging in interface org.apache.avalon.framework.logger.LogEnabled
Overrides:
enableLogging in class AbstractAnalyzer
Parameters:
logger - The super.getLog()

tokenStream

public final org.apache.lucene.analysis.TokenStream tokenStream(java.lang.String fieldName,
                                                                java.io.Reader reader)
Returns a token stream of romanized arabic words whose morphological categories are found to be semantically relevant.

Specified by:
tokenStream in interface Analyzer
Specified by:
tokenStream in class org.apache.lucene.analysis.Analyzer
Parameters:
reader - The reader
fieldName - The field
Returns:
The token stream

getAnalyzerType

protected java.lang.String getAnalyzerType()
Specified by:
getAnalyzerType in class AbstractAnalyzer
See Also:
fr.gouv.culture.sdx.search.lucene.analysis.AbstractAnalyzer#getAnalyserType()

tokenStream

public org.apache.lucene.analysis.TokenStream tokenStream(java.io.Reader reader)
Deprecated. use tokenStream(String, Reader) instead.

Creates a TokenStream which tokenizes all the text in the provided Reader. Provided for backward compatibility only.

See Also:
Analyzer.tokenStream(java.io.Reader)


Copyright © 2000-2010 Ministere de la culture et de la communication / AJLSM. All Rights Reserved.