DefaultAnalyzer (SDX 2.4.1 API)

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

fr.gouv.culture.sdx.search.lucene.analysis
Class DefaultAnalyzer

java.lang.Object
  org.apache.lucene.analysis.Analyzer
      fr.gouv.culture.sdx.search.lucene.analysis.AbstractAnalyzer
          fr.gouv.culture.sdx.search.lucene.analysis.DefaultAnalyzer

All Implemented Interfaces:: Analyzer, java.io.Serializable, org.apache.avalon.framework.configuration.Configurable, org.apache.avalon.framework.logger.LogEnabled, org.apache.excalibur.xml.sax.XMLizable

Direct Known Subclasses:: Analyzer_br, Analyzer_cn, Analyzer_cz, Analyzer_de, Analyzer_en, Analyzer_fr, Analyzer_ru

public class DefaultAnalyzer
extends AbstractAnalyzer
extends AbstractAnalyzer

A default Lucene analyzer used by SDX.

See Also:: Serialized Form

Field Summary
`protected static java.lang.String`	`ANALYZER_TYPE`
`protected static java.lang.String`	`ATTRIBUTE_EXCLUDE_STEMS` The attribute indicating the use of exclusion stem words or not.
`protected static java.lang.String`	`ATTRIBUTE_USE_STOP_WORDS` The attribute indicating the use of stop words or not.
`static java.lang.String[]`	`DEFAULT_STOP_WORDS` An array containing some common English words that are not usually useful for searching.
`protected java.lang.String`	`EXCLUDE_STEM_ELEMENT` String representation of the element name in the analyzer config file
`protected java.lang.String`	`EXCLUDE_STEMS_ELEMENT` String representation of the element name in the analyzer config file
`protected java.util.Set`	`excludeTable` The table for stemming exclusions
`protected java.util.Set`	`stopTable` The list of stop words used.

Fields inherited from class fr.gouv.culture.sdx.search.lucene.analysis.AbstractAnalyzer
`logger`

Constructor Summary
`DefaultAnalyzer()` Builds a default analyzer.

Method Summary
`protected java.util.Set`	`buildExcludeTable(org.apache.avalon.framework.configuration.Configuration conf)` Builds a stop word table from a configuration.
`protected java.util.Set`	`buildStopTable(org.apache.avalon.framework.configuration.Configuration conf)` Builds a stop word table from a configuration.
`void`	`configure(org.apache.avalon.framework.configuration.Configuration configuration)` Configures this analyzer.
`protected java.lang.String`	`getAnalyzerType()`
`protected java.lang.String[]`	`getDefaultStopWords()` Returns a default list of stop words.
`org.apache.lucene.analysis.TokenStream`	`tokenStream(java.io.Reader reader)` Deprecated. use tokenStream(String, Reader) instead.
`org.apache.lucene.analysis.TokenStream`	`tokenStream(java.lang.String fieldName, java.io.Reader reader)` Filters LowerCaseTokenizer with StopFilter.

Methods inherited from class fr.gouv.culture.sdx.search.lucene.analysis.AbstractAnalyzer
`enableLogging, toSAX`

Methods inherited from class org.apache.lucene.analysis.Analyzer
`getPositionIncrementGap`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

ATTRIBUTE_USE_STOP_WORDS

protected static final java.lang.String ATTRIBUTE_USE_STOP_WORDS

The attribute indicating the use of stop words or not.

See Also:: Constant Field Values

ATTRIBUTE_EXCLUDE_STEMS

protected static final java.lang.String ATTRIBUTE_EXCLUDE_STEMS

The attribute indicating the use of exclusion stem words or not.

See Also:: Constant Field Values

ANALYZER_TYPE

protected static final java.lang.String ANALYZER_TYPE

See Also:: Constant Field Values

stopTable

protected java.util.Set stopTable

The list of stop words used.

excludeTable

protected java.util.Set excludeTable

The table for stemming exclusions

EXCLUDE_STEMS_ELEMENT

protected final java.lang.String EXCLUDE_STEMS_ELEMENT

String representation of the element name in the analyzer config file

See Also:: Constant Field Values

EXCLUDE_STEM_ELEMENT

protected final java.lang.String EXCLUDE_STEM_ELEMENT

String representation of the element name in the analyzer config file

See Also:: Constant Field Values

DEFAULT_STOP_WORDS

public static final java.lang.String[] DEFAULT_STOP_WORDS

An array containing some common English words that are not usually useful for searching.

Constructor Detail

DefaultAnalyzer

public DefaultAnalyzer()

Builds a default analyzer.

This analyzer will use Lucene's StopAnalyzer.

Method Detail

getAnalyzerType

protected java.lang.String getAnalyzerType()

Specified by:: getAnalyzerType in class AbstractAnalyzer

See Also:: fr.gouv.culture.sdx.search.lucene.analysis.AbstractAnalyzer#getAnalyserType()

configure

public void configure(org.apache.avalon.framework.configuration.Configuration configuration)
               throws org.apache.avalon.framework.configuration.ConfigurationException

Configures this analyzer.

The class will search for <stopWord> elements and use them as a stop word list. If none is found or the configuration object is null, the default list wi be used.

If the top-level element <cconfiguration> has a false value for its useStopWords attribute, no stop words will be used.

Specified by:: configure in interface org.apache.avalon.framework.configuration.Configurable
Overrides:: configure in class AbstractAnalyzer

Throws:: org.apache.avalon.framework.configuration.ConfigurationException

tokenStream

public org.apache.lucene.analysis.TokenStream tokenStream(java.lang.String fieldName,
                                                          java.io.Reader reader)

Filters LowerCaseTokenizer with StopFilter.

Specified by:: tokenStream in interface Analyzer
Specified by:: tokenStream in class org.apache.lucene.analysis.Analyzer

buildStopTable

protected java.util.Set buildStopTable(org.apache.avalon.framework.configuration.Configuration conf)
                                throws SDXException,
                                       org.apache.avalon.framework.configuration.ConfigurationException

Builds a stop word table from a configuration.

Parameters:: conf - The configuration to use.
Returns:: Set
Throws:: SDXException; org.apache.avalon.framework.configuration.ConfigurationException

getDefaultStopWords

protected java.lang.String[] getDefaultStopWords()

Returns a default list of stop words.

buildExcludeTable

protected java.util.Set buildExcludeTable(org.apache.avalon.framework.configuration.Configuration conf)
                                   throws org.apache.avalon.framework.configuration.ConfigurationException

Builds a stop word table from a configuration.

Parameters:: conf - The configuration to use.
Returns:: Set
Throws:: org.apache.avalon.framework.configuration.ConfigurationException

tokenStream

public org.apache.lucene.analysis.TokenStream tokenStream(java.io.Reader reader)

Deprecated. use tokenStream(String, Reader) instead.

Creates a TokenStream which tokenizes all the text in the provided Reader. Provided for backward compatibility only.

See Also:: Analyzer.tokenStream(java.io.Reader)

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

fr.gouv.culture.sdx.search.lucene.analysis Class DefaultAnalyzer

ATTRIBUTE_USE_STOP_WORDS

ATTRIBUTE_EXCLUDE_STEMS

ANALYZER_TYPE

stopTable

excludeTable

EXCLUDE_STEMS_ELEMENT

EXCLUDE_STEM_ELEMENT

DEFAULT_STOP_WORDS

DefaultAnalyzer

getAnalyzerType

configure

tokenStream

buildStopTable

getDefaultStopWords

buildExcludeTable

tokenStream

fr.gouv.culture.sdx.search.lucene.analysis
Class DefaultAnalyzer