|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.lucene.analysis.Analyzer fr.gouv.culture.sdx.search.lucene.analysis.AbstractAnalyzer fr.gouv.culture.sdx.search.lucene.analysis.Glosser_ar_en
public final class Glosser_ar_en
An english glosser for the arabic language. This glosser uses Tim Buckwalter's algorithm (available at LDC Catalog) to identify the morphological category of arabic tokens and then return their glosses. The meaningful morphological categories are still to be determined but the current list gives good results.
Field Summary | |
---|---|
protected static java.lang.String |
ANALYZER_TYPE
|
static java.lang.String[] |
STOP_WORDS
An array containing some common english words that are usually not useful for searching. |
Fields inherited from class fr.gouv.culture.sdx.search.lucene.analysis.AbstractAnalyzer |
---|
logger |
Constructor Summary | |
---|---|
Glosser_ar_en()
|
Method Summary | |
---|---|
void |
configure(org.apache.avalon.framework.configuration.Configuration configuration)
Configure the glosser. |
void |
enableLogging(org.apache.avalon.framework.logger.Logger logger)
Transmits a super.getLog() to the class. |
protected java.lang.String |
getAnalyzerType()
|
org.apache.lucene.analysis.TokenStream |
tokenStream(java.io.Reader reader)
Deprecated. use tokenStream(String, Reader) instead. |
org.apache.lucene.analysis.TokenStream |
tokenStream(java.lang.String fieldName,
java.io.Reader reader)
Returns a token stream of glosses of arabic words whose morphological categories are found to be semantically meaningful. |
Methods inherited from class fr.gouv.culture.sdx.search.lucene.analysis.AbstractAnalyzer |
---|
toSAX |
Methods inherited from class org.apache.lucene.analysis.Analyzer |
---|
getPositionIncrementGap |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected static final java.lang.String ANALYZER_TYPE
public static final java.lang.String[] STOP_WORDS
Constructor Detail |
---|
public Glosser_ar_en()
Method Detail |
---|
protected java.lang.String getAnalyzerType()
getAnalyzerType
in class AbstractAnalyzer
public void configure(org.apache.avalon.framework.configuration.Configuration configuration) throws org.apache.avalon.framework.configuration.ConfigurationException
configure
in interface org.apache.avalon.framework.configuration.Configurable
configure
in class AbstractAnalyzer
configuration
- The configuration object
org.apache.avalon.framework.configuration.ConfigurationException
- If a problem occurs during configurationpublic void enableLogging(org.apache.avalon.framework.logger.Logger logger)
enableLogging
in interface org.apache.avalon.framework.logger.LogEnabled
enableLogging
in class AbstractAnalyzer
logger
- The super.getLog()public org.apache.lucene.analysis.TokenStream tokenStream(java.lang.String fieldName, java.io.Reader reader)
tokenStream
in interface Analyzer
tokenStream
in class org.apache.lucene.analysis.Analyzer
reader
- The reader
public org.apache.lucene.analysis.TokenStream tokenStream(java.io.Reader reader)
Analyzer.tokenStream(java.io.Reader)
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |