Analyzer_cz (SDX 2.4.1 API)

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

fr.gouv.culture.sdx.search.lucene.analysis
Class Analyzer_cz

java.lang.Object
  org.apache.lucene.analysis.Analyzer
      fr.gouv.culture.sdx.search.lucene.analysis.AbstractAnalyzer
          fr.gouv.culture.sdx.search.lucene.analysis.DefaultAnalyzer
              fr.gouv.culture.sdx.search.lucene.analysis.Analyzer_cz

All Implemented Interfaces:: Analyzer, java.io.Serializable, org.apache.avalon.framework.configuration.Configurable, org.apache.avalon.framework.logger.LogEnabled, org.apache.excalibur.xml.sax.XMLizable

public final class Analyzer_cz
extends DefaultAnalyzer
extends DefaultAnalyzer

Analyzer for Czech language. Supports an external list of stopwords (words that will not be indexed at all). A default set of stopwords is used unless an alternative list is specified, the exclusion list is empty by default.

Author:: Lukas Zapletal [lzap@root.cz]
See Also:: Serialized Form

Field Summary
`protected static java.lang.String`	`ANALYZER_TYPE`
`static java.lang.String[]`	`DEFAULT_STOP_WORDS` List of typical stopwords.

Fields inherited from class fr.gouv.culture.sdx.search.lucene.analysis.DefaultAnalyzer
`ATTRIBUTE_EXCLUDE_STEMS, ATTRIBUTE_USE_STOP_WORDS, EXCLUDE_STEM_ELEMENT, EXCLUDE_STEMS_ELEMENT, excludeTable, stopTable`

Fields inherited from class fr.gouv.culture.sdx.search.lucene.analysis.AbstractAnalyzer
`logger`

Constructor Summary
`Analyzer_cz()` Builds an analyzer.
`Analyzer_cz(java.io.File stopwords)` Builds an analyzer with the given stop words.
`Analyzer_cz(java.util.Set stopwords)` Builds an analyzer with the given stop words.
`Analyzer_cz(java.lang.String[] stopwords)` Builds an analyzer with the given stop words.

Method Summary
`protected java.lang.String`	`getAnalyzerType()`
`void`	`loadStopWords(java.io.InputStream wordfile, java.lang.String encoding)` Loads stopwords hash from resource stream (file, database...).
`org.apache.lucene.analysis.TokenStream`	`tokenStream(java.lang.String fieldName, java.io.Reader reader)` Creates a TokenStream which tokenizes all the text in the provided Reader.

Methods inherited from class fr.gouv.culture.sdx.search.lucene.analysis.DefaultAnalyzer
`buildExcludeTable, buildStopTable, configure, getDefaultStopWords, tokenStream`

Methods inherited from class fr.gouv.culture.sdx.search.lucene.analysis.AbstractAnalyzer
`enableLogging, toSAX`

Methods inherited from class org.apache.lucene.analysis.Analyzer
`getPositionIncrementGap`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

ANALYZER_TYPE

protected static final java.lang.String ANALYZER_TYPE

See Also:: Constant Field Values

DEFAULT_STOP_WORDS

public static final java.lang.String[] DEFAULT_STOP_WORDS

List of typical stopwords.

Constructor Detail

Analyzer_cz

public Analyzer_cz()

Builds an analyzer.

Analyzer_cz

public Analyzer_cz(java.lang.String[] stopwords)

Builds an analyzer with the given stop words.

Parameters:: stopwords -

Analyzer_cz

public Analyzer_cz(java.util.Set stopwords)

Builds an analyzer with the given stop words.

Parameters:: stopwords -

Analyzer_cz

public Analyzer_cz(java.io.File stopwords)
            throws java.io.IOException

Builds an analyzer with the given stop words.

Parameters:: stopwords -
Throws:: java.io.IOException

Method Detail

getAnalyzerType

protected java.lang.String getAnalyzerType()

Overrides:: getAnalyzerType in class DefaultAnalyzer

See Also:: fr.gouv.culture.sdx.search.lucene.analysis.AbstractAnalyzer#getAnalyserType()

loadStopWords

public void loadStopWords(java.io.InputStream wordfile,
                          java.lang.String encoding)

Loads stopwords hash from resource stream (file, database...).

Parameters:: wordfile - File containing the wordlist; encoding - Encoding used (win-1250, iso-8859-2, ...}, null for default system encoding

tokenStream

public final org.apache.lucene.analysis.TokenStream tokenStream(java.lang.String fieldName,
                                                                java.io.Reader reader)

Creates a TokenStream which tokenizes all the text in the provided Reader.

Specified by:: tokenStream in interface Analyzer
Overrides:: tokenStream in class DefaultAnalyzer

Returns:: A TokenStream build from a StandardTokenizer filtered with StandardFilter, StopFilter, GermanStemFilter and LowerCaseFilter

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

fr.gouv.culture.sdx.search.lucene.analysis Class Analyzer_cz

ANALYZER_TYPE

DEFAULT_STOP_WORDS

Analyzer_cz

Analyzer_cz

Analyzer_cz

Analyzer_cz

getAnalyzerType

loadStopWords

tokenStream

fr.gouv.culture.sdx.search.lucene.analysis
Class Analyzer_cz