Glosser_ar_en (SDX 2.4.1 API)

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

fr.gouv.culture.sdx.search.lucene.analysis
Class Glosser_ar_en

java.lang.Object
  org.apache.lucene.analysis.Analyzer
      fr.gouv.culture.sdx.search.lucene.analysis.AbstractAnalyzer
          fr.gouv.culture.sdx.search.lucene.analysis.Glosser_ar_en

All Implemented Interfaces:: Analyzer, java.io.Serializable, org.apache.avalon.framework.configuration.Configurable, org.apache.avalon.framework.logger.LogEnabled, org.apache.excalibur.xml.sax.XMLizable

public final class Glosser_ar_en
extends AbstractAnalyzer
extends AbstractAnalyzer

An english glosser for the arabic language. This glosser uses Tim Buckwalter's algorithm (available at LDC Catalog) to identify the morphological category of arabic tokens and then return their glosses. The meaningful morphological categories are still to be determined but the current list gives good results.

Author:: Pierrick Brihaye, 2003
See Also:: Serialized Form

Field Summary
`protected static java.lang.String`	`ANALYZER_TYPE`
`static java.lang.String[]`	`STOP_WORDS` An array containing some common english words that are usually not useful for searching.

Fields inherited from class fr.gouv.culture.sdx.search.lucene.analysis.AbstractAnalyzer
`logger`

Constructor Summary
`Glosser_ar_en()`

Method Summary
`void`	`configure(org.apache.avalon.framework.configuration.Configuration configuration)` Configure the glosser.
`void`	`enableLogging(org.apache.avalon.framework.logger.Logger logger)` Transmits a super.getLog() to the class.
`protected java.lang.String`	`getAnalyzerType()`
`org.apache.lucene.analysis.TokenStream`	`tokenStream(java.io.Reader reader)` Deprecated. use tokenStream(String, Reader) instead.
`org.apache.lucene.analysis.TokenStream`	`tokenStream(java.lang.String fieldName, java.io.Reader reader)` Returns a token stream of glosses of arabic words whose morphological categories are found to be semantically meaningful.

Methods inherited from class fr.gouv.culture.sdx.search.lucene.analysis.AbstractAnalyzer
`toSAX`

Methods inherited from class org.apache.lucene.analysis.Analyzer
`getPositionIncrementGap`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

ANALYZER_TYPE

protected static final java.lang.String ANALYZER_TYPE

See Also:: Constant Field Values

STOP_WORDS

public static final java.lang.String[] STOP_WORDS

An array containing some common english words that are usually not useful for searching.

Constructor Detail

Glosser_ar_en

public Glosser_ar_en()

Method Detail

getAnalyzerType

protected java.lang.String getAnalyzerType()

Specified by:: getAnalyzerType in class AbstractAnalyzer

configure

public void configure(org.apache.avalon.framework.configuration.Configuration configuration)
               throws org.apache.avalon.framework.configuration.ConfigurationException

Configure the glosser.

Specified by:: configure in interface org.apache.avalon.framework.configuration.Configurable
Overrides:: configure in class AbstractAnalyzer

Parameters:: configuration - The configuration object
Throws:: org.apache.avalon.framework.configuration.ConfigurationException - If a problem occurs during configuration

enableLogging

public void enableLogging(org.apache.avalon.framework.logger.Logger logger)

Transmits a super.getLog() to the class.

Specified by:: enableLogging in interface org.apache.avalon.framework.logger.LogEnabled
Overrides:: enableLogging in class AbstractAnalyzer

Parameters:: logger - The super.getLog()

tokenStream

public org.apache.lucene.analysis.TokenStream tokenStream(java.lang.String fieldName,
                                                          java.io.Reader reader)

Returns a token stream of glosses of arabic words whose morphological categories are found to be semantically meaningful.

Specified by:: tokenStream in interface Analyzer
Specified by:: tokenStream in class org.apache.lucene.analysis.Analyzer

Parameters:: reader - The reader
Returns:: The token stream

tokenStream

public org.apache.lucene.analysis.TokenStream tokenStream(java.io.Reader reader)

Deprecated. use tokenStream(String, Reader) instead.

Creates a TokenStream which tokenizes all the text in the provided Reader. Provided for backward compatibility only.

See Also:: Analyzer.tokenStream(java.io.Reader)

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

fr.gouv.culture.sdx.search.lucene.analysis Class Glosser_ar_en

ANALYZER_TYPE

STOP_WORDS

Glosser_ar_en

getAnalyzerType

configure

enableLogging

tokenStream

tokenStream

fr.gouv.culture.sdx.search.lucene.analysis
Class Glosser_ar_en