gpl.pierrick.brihaye.aramorph.lucene
Class ArabicTokenizer
java.lang.Object
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.Tokenizer
gpl.pierrick.brihaye.aramorph.lucene.ArabicTokenizer
- public class ArabicTokenizer
- extends org.apache.lucene.analysis.Tokenizer
A tokenizer that will return tokens in the arabic alphabet. This tokenizer
is a bit rude since it also filters digits and punctuation, even in an arabic
part of stream. Well... I've planned to write a
"universal", highly configurable, character tokenizer.
- Author:
- Pierrick Brihaye, 2003
Fields inherited from class org.apache.lucene.analysis.Tokenizer |
input |
Constructor Summary |
ArabicTokenizer(java.io.Reader input)
Constructs a tokenizer that will return tokens in the arabic alphabet. |
ArabicTokenizer(java.io.Reader input,
boolean debug)
Constructs a tokenizer that will return tokens in the arabic alphabet. |
Method Summary |
protected boolean |
isArabicChar(char c)
Whether or not a character is in the arabic alphabet. |
org.apache.lucene.analysis.Token |
next()
Returns the next token in the stream, or null at EOS. |
Methods inherited from class org.apache.lucene.analysis.Tokenizer |
close |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
ArabicTokenizer
public ArabicTokenizer(java.io.Reader input)
- Constructs a tokenizer that will return tokens in the arabic alphabet.
- Parameters:
input
- The reader
ArabicTokenizer
public ArabicTokenizer(java.io.Reader input,
boolean debug)
- Constructs a tokenizer that will return tokens in the arabic alphabet.
- Parameters:
input
- The readerdebug
- Whether or not the tokenizer should display convenience messages on System.out
isArabicChar
protected boolean isArabicChar(char c)
- Whether or not a character is in the arabic alphabet.
- Parameters:
c
- The char
- Returns:
- The result
next
public org.apache.lucene.analysis.Token next()
throws java.io.IOException
- Returns the next token in the stream, or
null
at EOS.
- Returns:
- The token with its type set to
ARABIC
- Throws:
java.io.IOException
- If a problem occurs