fr.gouv.culture.sdx.search.lucene.analysis
Class Analyzer_ar
java.lang.Object
org.apache.lucene.analysis.Analyzer
fr.gouv.culture.sdx.search.lucene.analysis.AbstractAnalyzer
fr.gouv.culture.sdx.search.lucene.analysis.Analyzer_ar
- All Implemented Interfaces:
- Analyzer, java.io.Serializable, org.apache.avalon.framework.configuration.Configurable, org.apache.avalon.framework.logger.LogEnabled, org.apache.excalibur.xml.sax.XMLizable
public final class Analyzer_ar
- extends AbstractAnalyzer
Analyzer for the arabic language. This analyzer uses Tim Buckwalter's algorithm
(avalaible at LDC
Catalog) to identify the morphological category of arabic tokens.
The relevant categories are still to be determined but the current list gives
good results.
Final tokens are a romanized canonical version of the word.
- Author:
- Pierrick Brihaye, 2003
- See Also:
- Serialized Form
Method Summary |
void |
configure(org.apache.avalon.framework.configuration.Configuration configuration)
Configure the glosser. |
void |
enableLogging(org.apache.avalon.framework.logger.Logger logger)
Transmits a super.getLog() to the class. |
protected java.lang.String |
getAnalyzerType()
|
org.apache.lucene.analysis.TokenStream |
tokenStream(java.io.Reader reader)
Deprecated. use tokenStream(String, Reader) instead. |
org.apache.lucene.analysis.TokenStream |
tokenStream(java.lang.String fieldName,
java.io.Reader reader)
Returns a token stream of romanized arabic words whose morphological categories are found to be semantically relevant. |
Methods inherited from class org.apache.lucene.analysis.Analyzer |
getPositionIncrementGap |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
ANALYZER_TYPE
protected static final java.lang.String ANALYZER_TYPE
- See Also:
- Constant Field Values
Analyzer_ar
public Analyzer_ar()
configure
public void configure(org.apache.avalon.framework.configuration.Configuration configuration)
throws org.apache.avalon.framework.configuration.ConfigurationException
- Configure the glosser.
- Specified by:
configure
in interface org.apache.avalon.framework.configuration.Configurable
- Overrides:
configure
in class AbstractAnalyzer
- Parameters:
configuration
- The configuration object
- Throws:
org.apache.avalon.framework.configuration.ConfigurationException
- If a problem occurs during configuration
enableLogging
public void enableLogging(org.apache.avalon.framework.logger.Logger logger)
- Transmits a super.getLog() to the class.
- Specified by:
enableLogging
in interface org.apache.avalon.framework.logger.LogEnabled
- Overrides:
enableLogging
in class AbstractAnalyzer
- Parameters:
logger
- The super.getLog()
tokenStream
public final org.apache.lucene.analysis.TokenStream tokenStream(java.lang.String fieldName,
java.io.Reader reader)
- Returns a token stream of romanized arabic words whose morphological categories are found to be semantically relevant.
- Specified by:
tokenStream
in interface Analyzer
- Specified by:
tokenStream
in class org.apache.lucene.analysis.Analyzer
- Parameters:
reader
- The readerfieldName
- The field
- Returns:
- The token stream
getAnalyzerType
protected java.lang.String getAnalyzerType()
- Specified by:
getAnalyzerType
in class AbstractAnalyzer
- See Also:
fr.gouv.culture.sdx.search.lucene.analysis.AbstractAnalyzer#getAnalyserType()
tokenStream
public org.apache.lucene.analysis.TokenStream tokenStream(java.io.Reader reader)
- Deprecated. use tokenStream(String, Reader) instead.
- Creates a TokenStream which tokenizes all the text in the provided
Reader. Provided for backward compatibility only.
- See Also:
Analyzer.tokenStream(java.io.Reader)
Copyright © 2000-2010 Ministere de la culture et de la communication / AJLSM. All Rights Reserved.