net.sf.eos.sentence
Class DefaultSentencer

java.lang.Object
  extended by net.sf.eos.config.Configured
      extended by net.sf.eos.sentence.Sentencer
          extended by net.sf.eos.sentence.DefaultSentencer
All Implemented Interfaces:
Configurable

public class DefaultSentencer
extends Sentencer

Simple default implementation.

Author:
Sascha Kohlmann

Field Summary
 
Fields inherited from class net.sf.eos.sentence.Sentencer
DEFAULT_MESSAGE_DIGEST, MESSAGE_DIGEST_CONFIG_NAME, SENTENCER_IMPL_CONFIG_NAME
 
Constructor Summary
DefaultSentencer()
          Creates a new instance.
 
Method Summary
 Map<String,EosDocument> toSentenceDocuments(EosDocument doc, SentenceTokenizer sentencer, ResettableTokenizer tokenizer, TextBuilder builder)
          Fragments a document into documents of sentences.
 
Methods inherited from class net.sf.eos.sentence.Sentencer
createDigester, newInstance
 
Methods inherited from class net.sf.eos.config.Configured
configure, getConfiguration
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DefaultSentencer

public DefaultSentencer()
Creates a new instance.

Method Detail

toSentenceDocuments

public Map<String,EosDocument> toSentenceDocuments(EosDocument doc,
                                                   SentenceTokenizer sentencer,
                                                   ResettableTokenizer tokenizer,
                                                   TextBuilder builder)
                                            throws EosException
Description copied from class: Sentencer
Fragments a document into documents of sentences. The return value is a map of message digests and sentenced document. The documents of the return value has all metada data of the original document and maybe additional metadata.

Specified by:
toSentenceDocuments in class Sentencer
Parameters:
doc - the document to fragment
sentencer - a sentencer instance
tokenizer - a tokenizer instance to tokenize the result of the sentencer
builder - the builder supports the rebuilding of the tokenizer
Returns:
a map of message digest -> document relations
Throws:
EosException - if an error occurs


Copyright © 2008. All Rights Reserved.