fr.gouv.culture.sdx.search.lucene.analysis
Class Analyzer_cz

java.lang.Object
  extended by org.apache.lucene.analysis.Analyzer
      extended by fr.gouv.culture.sdx.search.lucene.analysis.AbstractAnalyzer
          extended by fr.gouv.culture.sdx.search.lucene.analysis.DefaultAnalyzer
              extended by fr.gouv.culture.sdx.search.lucene.analysis.Analyzer_cz
All Implemented Interfaces:
Analyzer, java.io.Serializable, org.apache.avalon.framework.configuration.Configurable, org.apache.avalon.framework.logger.LogEnabled, org.apache.excalibur.xml.sax.XMLizable

public final class Analyzer_cz
extends DefaultAnalyzer

Analyzer for Czech language. Supports an external list of stopwords (words that will not be indexed at all). A default set of stopwords is used unless an alternative list is specified, the exclusion list is empty by default.

Author:
Lukas Zapletal [lzap@root.cz]
See Also:
Serialized Form

Field Summary
protected static java.lang.String ANALYZER_TYPE
           
static java.lang.String[] DEFAULT_STOP_WORDS
          List of typical stopwords.
 
Fields inherited from class fr.gouv.culture.sdx.search.lucene.analysis.DefaultAnalyzer
ATTRIBUTE_EXCLUDE_STEMS, ATTRIBUTE_USE_STOP_WORDS, EXCLUDE_STEM_ELEMENT, EXCLUDE_STEMS_ELEMENT, excludeTable, stopTable
 
Fields inherited from class fr.gouv.culture.sdx.search.lucene.analysis.AbstractAnalyzer
logger
 
Constructor Summary
Analyzer_cz()
          Builds an analyzer.
Analyzer_cz(java.io.File stopwords)
          Builds an analyzer with the given stop words.
Analyzer_cz(java.util.Set stopwords)
          Builds an analyzer with the given stop words.
Analyzer_cz(java.lang.String[] stopwords)
          Builds an analyzer with the given stop words.
 
Method Summary
protected  java.lang.String getAnalyzerType()
           
 void loadStopWords(java.io.InputStream wordfile, java.lang.String encoding)
          Loads stopwords hash from resource stream (file, database...).
 org.apache.lucene.analysis.TokenStream tokenStream(java.lang.String fieldName, java.io.Reader reader)
          Creates a TokenStream which tokenizes all the text in the provided Reader.
 
Methods inherited from class fr.gouv.culture.sdx.search.lucene.analysis.DefaultAnalyzer
buildExcludeTable, buildStopTable, configure, getDefaultStopWords, tokenStream
 
Methods inherited from class fr.gouv.culture.sdx.search.lucene.analysis.AbstractAnalyzer
enableLogging, toSAX
 
Methods inherited from class org.apache.lucene.analysis.Analyzer
getPositionIncrementGap
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

ANALYZER_TYPE

protected static final java.lang.String ANALYZER_TYPE
See Also:
Constant Field Values

DEFAULT_STOP_WORDS

public static final java.lang.String[] DEFAULT_STOP_WORDS
List of typical stopwords.

Constructor Detail

Analyzer_cz

public Analyzer_cz()
Builds an analyzer.


Analyzer_cz

public Analyzer_cz(java.lang.String[] stopwords)
Builds an analyzer with the given stop words.

Parameters:
stopwords -

Analyzer_cz

public Analyzer_cz(java.util.Set stopwords)
Builds an analyzer with the given stop words.

Parameters:
stopwords -

Analyzer_cz

public Analyzer_cz(java.io.File stopwords)
            throws java.io.IOException
Builds an analyzer with the given stop words.

Parameters:
stopwords -
Throws:
java.io.IOException
Method Detail

getAnalyzerType

protected java.lang.String getAnalyzerType()
Overrides:
getAnalyzerType in class DefaultAnalyzer
See Also:
fr.gouv.culture.sdx.search.lucene.analysis.AbstractAnalyzer#getAnalyserType()

loadStopWords

public void loadStopWords(java.io.InputStream wordfile,
                          java.lang.String encoding)
Loads stopwords hash from resource stream (file, database...).

Parameters:
wordfile - File containing the wordlist
encoding - Encoding used (win-1250, iso-8859-2, ...}, null for default system encoding

tokenStream

public final org.apache.lucene.analysis.TokenStream tokenStream(java.lang.String fieldName,
                                                                java.io.Reader reader)
Creates a TokenStream which tokenizes all the text in the provided Reader.

Specified by:
tokenStream in interface Analyzer
Overrides:
tokenStream in class DefaultAnalyzer
Returns:
A TokenStream build from a StandardTokenizer filtered with StandardFilter, StopFilter, GermanStemFilter and LowerCaseFilter


Copyright © 2000-2010 Ministere de la culture et de la communication / AJLSM. All Rights Reserved.