|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectnet.sf.eos.config.Configured
net.sf.eos.sentence.Sentencer
public abstract class Sentencer
The implementation fragmented EosDocument with more then one sentence
in a lot of sentences with maybe only one sentence. Each sentence is also
represented by a hashcode. The hashcode is able to support removing double
sentences from a corpus.
| Field Summary | |
|---|---|
static String |
DEFAULT_MESSAGE_DIGEST
The default message digest algorithm. |
static String |
MESSAGE_DIGEST_CONFIG_NAME
The name of the algorithm of the message digest. |
static String |
SENTENCER_IMPL_CONFIG_NAME
The configuration key name for the classname of the implementation. |
| Constructor Summary | |
|---|---|
protected |
Sentencer()
Creates a new instance. |
| Method Summary | |
|---|---|
protected MessageDigest |
createDigester()
Returns the message digest implementation. |
static Sentencer |
newInstance(Configuration config)
Creates a new instance of a of the implementation. |
abstract Map<String,EosDocument> |
toSentenceDocuments(EosDocument doc,
SentenceTokenizer sentencer,
ResettableTokenizer tokenizer,
TextBuilder builder)
Fragments a document into documents of sentences. |
| Methods inherited from class net.sf.eos.config.Configured |
|---|
configure, getConfiguration |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public static final String DEFAULT_MESSAGE_DIGEST
@ConfigurationKey(type=CLASSNAME,
defaultValue="md5",
description="The message digest.")
public static final String MESSAGE_DIGEST_CONFIG_NAME
@ConfigurationKey(type=CLASSNAME,
description="Configuration key of the sentencer.")
public static final String SENTENCER_IMPL_CONFIG_NAME
newInstance(Configuration),
Constant Field Values| Constructor Detail |
|---|
protected Sentencer()
| Method Detail |
|---|
@FactoryMethod(key="net.sf.eos.sentence.Sentencer.impl",
implementation=DefaultSentencer.class)
public static final Sentencer newInstance(Configuration config)
throws EosException
Configuration contains a key
SENTENCER_IMPL_CONFIG_NAME a new instance of the
classname in the value will instantiate. The
DefaultSentencer will instantiate if there is no
value setted.
config - the configuration
EosException - if it is not possible to instantiate an instance
protected MessageDigest createDigester()
throws EosException
configuration contains no value
for the key MESSAGE_DIGEST_CONFIG_NAME the
default digest will be
used.
EosException - if it is not possible to create the message digest
public abstract Map<String,EosDocument> toSentenceDocuments(EosDocument doc,
SentenceTokenizer sentencer,
ResettableTokenizer tokenizer,
TextBuilder builder)
throws EosException
doc - the document to fragmentsentencer - a sentencer instancetokenizer - a tokenizer instance to tokenize the result of the
sentencerbuilder - the builder supports the rebuilding of the
tokenizer
EosException - if an error occurs
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||