|
||||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |
CharSequence
.
See:
Description
Interface Summary | |
---|---|
ResettableTokenizer | Implementation that are prepared for reuse should implement this interface. |
Token | A Token represents a part of a tokenized text. |
Tokenizer | An implementation splits text data into its Token . |
Class Summary | |
---|---|
AbstractToken | Simple implementation for reuse. |
CaseTokenFilter | Transforms the input token to a upper or lower cased format for a given
Locale . |
ResettableTokenFilter | A token filter that supports handling with resettable tokenizer. |
SentenceTokenizer | Tokenized a text into sentences. |
StopTokenFilter | Filter for stop words out of the token stream. |
SurroundingTokenFilter | The filter removes surrounding braces and other characters around a token text. |
TextBuilder | Implementation creates new text sequences from Token - or
CharSequence -lists. |
TextBuilder.SpaceBuilder | Simple implementation concats all texts from the tokens delimited by space (ASCII 0x20). |
TokenFilter | Main class to support Tokenizer chaining, also known as
decorator pattern. |
TokenizerSupplier | Support class for ResettableTokenizer . |
WhitespaceTokenizer | Tokenized a sequence of chars at whitespaces. |
Exception Summary | |
---|---|
TokenizerException | Will throw if something goes wrong during tokenization. |
The package contains classes and patterns to support analyzing of
CharSequence
. Implementation of
Tokenizer
are the base to disassemble
CharSequences
into Token
.
TextBuilder
rebuilds a new
CharSequence
from an list of CharSequences
or
Token
.
The TokenFilter
implements
the decorator pattern.
With the TokenizerSupplier
it is possible to
implement classes that returns a complete chain.
Classes that implements the ResettableTokenizer
should be reused by there clients.
|
||||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |