Package net.sf.eos.analyzer

The package contains classes and patterns to support analyzing of CharSequence.

See:
          Description

Interface Summary
ResettableTokenizer Implementation that are prepared for reuse should implement this interface.
Token A Token represents a part of a tokenized text.
Tokenizer An implementation splits text data into its Token.
 

Class Summary
AbstractToken Simple implementation for reuse.
CaseTokenFilter Transforms the input token to a upper or lower cased format for a given Locale.
ResettableTokenFilter A token filter that supports handling with resettable tokenizer.
SentenceTokenizer Tokenized a text into sentences.
StopTokenFilter Filter for stop words out of the token stream.
SurroundingTokenFilter The filter removes surrounding braces and other characters around a token text.
TextBuilder Implementation creates new text sequences from Token- or CharSequence-lists.
TextBuilder.SpaceBuilder Simple implementation concats all texts from the tokens delimited by space (ASCII 0x20).
TokenFilter Main class to support Tokenizer chaining, also known as decorator pattern.
TokenizerSupplier Support class for ResettableTokenizer.
WhitespaceTokenizer Tokenized a sequence of chars at whitespaces.
 

Exception Summary
TokenizerException Will throw if something goes wrong during tokenization.
 

Package net.sf.eos.analyzer Description

The package contains classes and patterns to support analyzing of CharSequence. Implementation of Tokenizer are the base to disassemble CharSequences into Token. TextBuilder rebuilds a new CharSequence from an list of CharSequences or Token.

The TokenFilter implements the decorator pattern. With the TokenizerSupplier it is possible to implement classes that returns a complete chain.

Classes that implements the ResettableTokenizer should be reused by there clients.

Since:
0.1.0
Author:
Sascha Kohlmann


Copyright © 2008. All Rights Reserved.