Analyzer_ar (SDX 2.4.1 API)

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

fr.gouv.culture.sdx.search.lucene.analysis
Class Analyzer_ar

java.lang.Object
  org.apache.lucene.analysis.Analyzer
      fr.gouv.culture.sdx.search.lucene.analysis.AbstractAnalyzer
          fr.gouv.culture.sdx.search.lucene.analysis.Analyzer_ar

All Implemented Interfaces:: Analyzer, java.io.Serializable, org.apache.avalon.framework.configuration.Configurable, org.apache.avalon.framework.logger.LogEnabled, org.apache.excalibur.xml.sax.XMLizable

public final class Analyzer_ar
extends AbstractAnalyzer
extends AbstractAnalyzer

Analyzer for the arabic language. This analyzer uses Tim Buckwalter's algorithm (avalaible at LDC Catalog) to identify the morphological category of arabic tokens. The relevant categories are still to be determined but the current list gives good results. Final tokens are a romanized canonical version of the word.

Author:: Pierrick Brihaye, 2003
See Also:: Serialized Form

Field Summary
`protected static java.lang.String`	`ANALYZER_TYPE`

Fields inherited from class fr.gouv.culture.sdx.search.lucene.analysis.AbstractAnalyzer
`logger`

Constructor Summary
`Analyzer_ar()`

Method Summary
`void`	`configure(org.apache.avalon.framework.configuration.Configuration configuration)` Configure the glosser.
`void`	`enableLogging(org.apache.avalon.framework.logger.Logger logger)` Transmits a super.getLog() to the class.
`protected java.lang.String`	`getAnalyzerType()`
`org.apache.lucene.analysis.TokenStream`	`tokenStream(java.io.Reader reader)` Deprecated. use tokenStream(String, Reader) instead.
`org.apache.lucene.analysis.TokenStream`	`tokenStream(java.lang.String fieldName, java.io.Reader reader)` Returns a token stream of romanized arabic words whose morphological categories are found to be semantically relevant.

Methods inherited from class fr.gouv.culture.sdx.search.lucene.analysis.AbstractAnalyzer
`toSAX`

Methods inherited from class org.apache.lucene.analysis.Analyzer
`getPositionIncrementGap`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

ANALYZER_TYPE

protected static final java.lang.String ANALYZER_TYPE

See Also:: Constant Field Values

Constructor Detail

Analyzer_ar

public Analyzer_ar()

Method Detail

configure

public void configure(org.apache.avalon.framework.configuration.Configuration configuration)
               throws org.apache.avalon.framework.configuration.ConfigurationException

Configure the glosser.

Specified by:: configure in interface org.apache.avalon.framework.configuration.Configurable
Overrides:: configure in class AbstractAnalyzer

Parameters:: configuration - The configuration object
Throws:: org.apache.avalon.framework.configuration.ConfigurationException - If a problem occurs during configuration

enableLogging

public void enableLogging(org.apache.avalon.framework.logger.Logger logger)

Transmits a super.getLog() to the class.

Specified by:: enableLogging in interface org.apache.avalon.framework.logger.LogEnabled
Overrides:: enableLogging in class AbstractAnalyzer

Parameters:: logger - The super.getLog()

tokenStream

public final org.apache.lucene.analysis.TokenStream tokenStream(java.lang.String fieldName,
                                                                java.io.Reader reader)

Returns a token stream of romanized arabic words whose morphological categories are found to be semantically relevant.

Specified by:: tokenStream in interface Analyzer
Specified by:: tokenStream in class org.apache.lucene.analysis.Analyzer

Parameters:: reader - The reader; fieldName - The field
Returns:: The token stream

getAnalyzerType

protected java.lang.String getAnalyzerType()

Specified by:: getAnalyzerType in class AbstractAnalyzer

See Also:: fr.gouv.culture.sdx.search.lucene.analysis.AbstractAnalyzer#getAnalyserType()

tokenStream

public org.apache.lucene.analysis.TokenStream tokenStream(java.io.Reader reader)

Deprecated. use tokenStream(String, Reader) instead.

Creates a TokenStream which tokenizes all the text in the provided Reader. Provided for backward compatibility only.

See Also:: Analyzer.tokenStream(java.io.Reader)

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

fr.gouv.culture.sdx.search.lucene.analysis Class Analyzer_ar

ANALYZER_TYPE

Analyzer_ar

configure

enableLogging

tokenStream

getAnalyzerType

tokenStream

fr.gouv.culture.sdx.search.lucene.analysis
Class Analyzer_ar