net.sf.eos.analyzer
Class SentenceTokenizer

java.lang.Object
  extended by net.sf.eos.analyzer.SentenceTokenizer
All Implemented Interfaces:
ResettableTokenizer, Tokenizer

public class SentenceTokenizer
extends Object
implements ResettableTokenizer

Tokenized a text into sentences.

Based on BreakIterator.getLineInstance(Locale).

Author:
Sascha Kohlmann

Field Summary
static String SENTENCE_TYPE
           
 
Constructor Summary
SentenceTokenizer()
           
SentenceTokenizer(CharSequence text)
          Creates a new tokenizer.
SentenceTokenizer(CharSequence text, Locale locale)
          Creates a new tokenizer.
 
Method Summary
 Token next()
          The next token or null.
protected  CharSequence nextSentence()
          Override this method to implement a different sentence tokenizer.
 void reset(CharSequence input)
          Inits the tokenizer with new input data.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

SENTENCE_TYPE

public static final String SENTENCE_TYPE
See Also:
Constant Field Values
Constructor Detail

SentenceTokenizer

public SentenceTokenizer()

SentenceTokenizer

public SentenceTokenizer(CharSequence text)
Creates a new tokenizer. Uses default Locale.

Parameters:
text - the text to tokenize into sentences.

SentenceTokenizer

public SentenceTokenizer(CharSequence text,
                         Locale locale)
Creates a new tokenizer.

Parameters:
text - the text to tokenize into sentences.
locale -
Method Detail

next

public Token next()
           throws TokenizerException
Description copied from interface: Tokenizer
The next token or null.

Specified by:
next in interface Tokenizer
Returns:
the next token or null
Throws:
TokenizerException

reset

public void reset(CharSequence input)
           throws TokenizerException
Description copied from interface: ResettableTokenizer
Inits the tokenizer with new input data.

Specified by:
reset in interface ResettableTokenizer
Parameters:
input - represents new input data for the tokenizer.
Throws:
TokenizerException

nextSentence

protected CharSequence nextSentence()
                             throws TokenizerException
Override this method to implement a different sentence tokenizer.

Returns:
a sentence or null if no next sentence available.
Throws:
TokenizerException - if an error occurs


Copyright © 2008. All Rights Reserved.