ArabicStemAnalyzer

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

gpl.pierrick.brihaye.aramorph.lucene
Class ArabicStemAnalyzer

java.lang.Object
  org.apache.lucene.analysis.Analyzer
      gpl.pierrick.brihaye.aramorph.lucene.ArabicStemAnalyzer

public final class ArabicStemAnalyzer
extends org.apache.lucene.analysis.Analyzer

Analyzer for the arabic language. This analyzer uses Tim Buckwalter's algorithm (available at LDC Catalog) to identify the morphological category of arabic tokens. The significant grammatical categories are still to be determined but the current list gives good results. Final tokens are a romanized version of the canonical word.

Author:: Pierrick Brihaye, 2003

Field Summary
`protected boolean`	`outputBuckwalter` Whether or not the analyzer should output tokens in the Buckwalter transliteration system

Constructor Summary
`ArabicStemAnalyzer()` Constructs an analyzer that will return grammatically significant arabic tokens in the Buckwalter transliteration system.
`ArabicStemAnalyzer(boolean outputBuckwalter)` Constructs an analyzer that will return grammatically significant arabic tokens.

Method Summary
`org.apache.lucene.analysis.TokenStream`	`tokenStream(java.lang.String FieldName, java.io.Reader reader)` Returns a token stream of arabic words whose grammatically categories are found to be significant.

Methods inherited from class org.apache.lucene.analysis.Analyzer

tokenStream

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail

outputBuckwalter

protected boolean outputBuckwalter

Whether or not the analyzer should output tokens in the Buckwalter transliteration system

Constructor Detail