fr.gouv.culture.sdx.search.lucene.analysis
Class DefaultAnalyzer

java.lang.Object
  extended by org.apache.lucene.analysis.Analyzer
      extended by fr.gouv.culture.sdx.search.lucene.analysis.AbstractAnalyzer
          extended by fr.gouv.culture.sdx.search.lucene.analysis.DefaultAnalyzer
All Implemented Interfaces:
Analyzer, java.io.Serializable, org.apache.avalon.framework.configuration.Configurable, org.apache.avalon.framework.logger.LogEnabled, org.apache.excalibur.xml.sax.XMLizable
Direct Known Subclasses:
Analyzer_br, Analyzer_cn, Analyzer_cz, Analyzer_de, Analyzer_en, Analyzer_fr, Analyzer_ru

public class DefaultAnalyzer
extends AbstractAnalyzer

A default Lucene analyzer used by SDX.

See Also:
Serialized Form

Field Summary
protected static java.lang.String ANALYZER_TYPE
           
protected static java.lang.String ATTRIBUTE_EXCLUDE_STEMS
          The attribute indicating the use of exclusion stem words or not.
protected static java.lang.String ATTRIBUTE_USE_STOP_WORDS
          The attribute indicating the use of stop words or not.
static java.lang.String[] DEFAULT_STOP_WORDS
          An array containing some common English words that are not usually useful for searching.
protected  java.lang.String EXCLUDE_STEM_ELEMENT
          String representation of the element name in the analyzer config file
protected  java.lang.String EXCLUDE_STEMS_ELEMENT
          String representation of the element name in the analyzer config file
protected  java.util.Set excludeTable
          The table for stemming exclusions
protected  java.util.Set stopTable
          The list of stop words used.
 
Fields inherited from class fr.gouv.culture.sdx.search.lucene.analysis.AbstractAnalyzer
logger
 
Constructor Summary
DefaultAnalyzer()
          Builds a default analyzer.
 
Method Summary
protected  java.util.Set buildExcludeTable(org.apache.avalon.framework.configuration.Configuration conf)
          Builds a stop word table from a configuration.
protected  java.util.Set buildStopTable(org.apache.avalon.framework.configuration.Configuration conf)
          Builds a stop word table from a configuration.
 void configure(org.apache.avalon.framework.configuration.Configuration configuration)
          Configures this analyzer.
protected  java.lang.String getAnalyzerType()
           
protected  java.lang.String[] getDefaultStopWords()
          Returns a default list of stop words.
 org.apache.lucene.analysis.TokenStream tokenStream(java.io.Reader reader)
          Deprecated. use tokenStream(String, Reader) instead.
 org.apache.lucene.analysis.TokenStream tokenStream(java.lang.String fieldName, java.io.Reader reader)
          Filters LowerCaseTokenizer with StopFilter.
 
Methods inherited from class fr.gouv.culture.sdx.search.lucene.analysis.AbstractAnalyzer
enableLogging, toSAX
 
Methods inherited from class org.apache.lucene.analysis.Analyzer
getPositionIncrementGap
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

ATTRIBUTE_USE_STOP_WORDS

protected static final java.lang.String ATTRIBUTE_USE_STOP_WORDS
The attribute indicating the use of stop words or not.

See Also:
Constant Field Values

ATTRIBUTE_EXCLUDE_STEMS

protected static final java.lang.String ATTRIBUTE_EXCLUDE_STEMS
The attribute indicating the use of exclusion stem words or not.

See Also:
Constant Field Values

ANALYZER_TYPE

protected static final java.lang.String ANALYZER_TYPE
See Also:
Constant Field Values

stopTable

protected java.util.Set stopTable
The list of stop words used.


excludeTable

protected java.util.Set excludeTable
The table for stemming exclusions


EXCLUDE_STEMS_ELEMENT

protected final java.lang.String EXCLUDE_STEMS_ELEMENT
String representation of the element name in the analyzer config file

See Also:
Constant Field Values

EXCLUDE_STEM_ELEMENT

protected final java.lang.String EXCLUDE_STEM_ELEMENT
String representation of the element name in the analyzer config file

See Also:
Constant Field Values

DEFAULT_STOP_WORDS

public static final java.lang.String[] DEFAULT_STOP_WORDS
An array containing some common English words that are not usually useful for searching.

Constructor Detail

DefaultAnalyzer

public DefaultAnalyzer()
Builds a default analyzer.

This analyzer will use Lucene's StopAnalyzer.

Method Detail

getAnalyzerType

protected java.lang.String getAnalyzerType()
Specified by:
getAnalyzerType in class AbstractAnalyzer
See Also:
fr.gouv.culture.sdx.search.lucene.analysis.AbstractAnalyzer#getAnalyserType()

configure

public void configure(org.apache.avalon.framework.configuration.Configuration configuration)
               throws org.apache.avalon.framework.configuration.ConfigurationException
Configures this analyzer.

The class will search for <stopWord> elements and use them as a stop word list. If none is found or the configuration object is null, the default list wi be used.

If the top-level element <cconfiguration> has a false value for its useStopWords attribute, no stop words will be used.

Specified by:
configure in interface org.apache.avalon.framework.configuration.Configurable
Overrides:
configure in class AbstractAnalyzer
Throws:
org.apache.avalon.framework.configuration.ConfigurationException

tokenStream

public org.apache.lucene.analysis.TokenStream tokenStream(java.lang.String fieldName,
                                                          java.io.Reader reader)
Filters LowerCaseTokenizer with StopFilter.

Specified by:
tokenStream in interface Analyzer
Specified by:
tokenStream in class org.apache.lucene.analysis.Analyzer

buildStopTable

protected java.util.Set buildStopTable(org.apache.avalon.framework.configuration.Configuration conf)
                                throws SDXException,
                                       org.apache.avalon.framework.configuration.ConfigurationException
Builds a stop word table from a configuration.

Parameters:
conf - The configuration to use.
Returns:
Set
Throws:
SDXException
org.apache.avalon.framework.configuration.ConfigurationException

getDefaultStopWords

protected java.lang.String[] getDefaultStopWords()
Returns a default list of stop words.


buildExcludeTable

protected java.util.Set buildExcludeTable(org.apache.avalon.framework.configuration.Configuration conf)
                                   throws org.apache.avalon.framework.configuration.ConfigurationException
Builds a stop word table from a configuration.

Parameters:
conf - The configuration to use.
Returns:
Set
Throws:
org.apache.avalon.framework.configuration.ConfigurationException

tokenStream

public org.apache.lucene.analysis.TokenStream tokenStream(java.io.Reader reader)
Deprecated. use tokenStream(String, Reader) instead.

Creates a TokenStream which tokenizes all the text in the provided Reader. Provided for backward compatibility only.

See Also:
Analyzer.tokenStream(java.io.Reader)


Copyright © 2000-2010 Ministere de la culture et de la communication / AJLSM. All Rights Reserved.