|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object net.sf.eos.config.Configured net.sf.eos.sentence.Sentencer
public abstract class Sentencer
The implementation fragmented EosDocument
with more then one sentence
in a lot of sentences with maybe only one sentence. Each sentence is also
represented by a hashcode. The hashcode is able to support removing double
sentences from a corpus.
Field Summary | |
---|---|
static String |
DEFAULT_MESSAGE_DIGEST
The default message digest algorithm. |
static String |
MESSAGE_DIGEST_CONFIG_NAME
The name of the algorithm of the message digest. |
static String |
SENTENCER_IMPL_CONFIG_NAME
The configuration key name for the classname of the implementation. |
Constructor Summary | |
---|---|
protected |
Sentencer()
Creates a new instance. |
Method Summary | |
---|---|
protected MessageDigest |
createDigester()
Returns the message digest implementation. |
static Sentencer |
newInstance(Configuration config)
Creates a new instance of a of the implementation. |
abstract Map<String,EosDocument> |
toSentenceDocuments(EosDocument doc,
SentenceTokenizer sentencer,
ResettableTokenizer tokenizer,
TextBuilder builder)
Fragments a document into documents of sentences. |
Methods inherited from class net.sf.eos.config.Configured |
---|
configure, getConfiguration |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final String DEFAULT_MESSAGE_DIGEST
@ConfigurationKey(type=CLASSNAME, defaultValue="md5", description="The message digest.") public static final String MESSAGE_DIGEST_CONFIG_NAME
@ConfigurationKey(type=CLASSNAME, description="Configuration key of the sentencer.") public static final String SENTENCER_IMPL_CONFIG_NAME
newInstance(Configuration)
,
Constant Field ValuesConstructor Detail |
---|
protected Sentencer()
Method Detail |
---|
@FactoryMethod(key="net.sf.eos.sentence.Sentencer.impl", implementation=DefaultSentencer.class) public static final Sentencer newInstance(Configuration config) throws EosException
Configuration
contains a key
SENTENCER_IMPL_CONFIG_NAME
a new instance of the
classname in the value will instantiate. The
DefaultSentencer
will instantiate if there is no
value setted.
config
- the configuration
EosException
- if it is not possible to instantiate an instanceprotected MessageDigest createDigester() throws EosException
configuration
contains no value
for the key MESSAGE_DIGEST_CONFIG_NAME
the
default digest will be
used.
EosException
- if it is not possible to create the message digestpublic abstract Map<String,EosDocument> toSentenceDocuments(EosDocument doc, SentenceTokenizer sentencer, ResettableTokenizer tokenizer, TextBuilder builder) throws EosException
doc
- the document to fragmentsentencer
- a sentencer instancetokenizer
- a tokenizer instance to tokenize the result of the
sentencerbuilder
- the builder supports the rebuilding of the
tokenizer
EosException
- if an error occurs
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |