|
||||||||||
| PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES | |||||||||
CharSequence.
See:
Description
| Interface Summary | |
|---|---|
| ResettableTokenizer | Implementation that are prepared for reuse should implement this interface. |
| Token | A Token represents a part of a tokenized text. |
| Tokenizer | An implementation splits text data into its Token. |
| Class Summary | |
|---|---|
| AbstractToken | Simple implementation for reuse. |
| CaseTokenFilter | Transforms the input token to a upper or lower cased format for a given
Locale. |
| ResettableTokenFilter | A token filter that supports handling with resettable tokenizer. |
| SentenceTokenizer | Tokenized a text into sentences. |
| StopTokenFilter | Filter for stop words out of the token stream. |
| SurroundingTokenFilter | The filter removes surrounding braces and other characters around a token text. |
| TextBuilder | Implementation creates new text sequences from Token- or
CharSequence-lists. |
| TextBuilder.SpaceBuilder | Simple implementation concats all texts from the tokens delimited by space (ASCII 0x20). |
| TokenFilter | Main class to support Tokenizer chaining, also known as
decorator pattern. |
| TokenizerSupplier | Support class for ResettableTokenizer. |
| WhitespaceTokenizer | Tokenized a sequence of chars at whitespaces. |
| Exception Summary | |
|---|---|
| TokenizerException | Will throw if something goes wrong during tokenization. |
The package contains classes and patterns to support analyzing of
CharSequence. Implementation of
Tokenizer are the base to disassemble
CharSequences into Token.
TextBuilder rebuilds a new
CharSequence from an list of CharSequences or
Token.
The TokenFilter implements
the decorator pattern.
With the TokenizerSupplier it is possible to
implement classes that returns a complete chain.
Classes that implements the ResettableTokenizer
should be reused by there clients.
|
||||||||||
| PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES | |||||||||