net.sf.eos.hadoop.mapred.cooccurrence
Class DictionaryBasedEntityIdKeyGenerator

java.lang.Object
  extended by net.sf.eos.config.Configured
      extended by net.sf.eos.hadoop.mapred.cooccurrence.DictionaryBasedEntityIdKeyGenerator
All Implemented Interfaces:
Configurable

public class DictionaryBasedEntityIdKeyGenerator
extends Configured


Constructor Summary
DictionaryBasedEntityIdKeyGenerator()
           
 
Method Summary
 Map<Text,EosDocument> createKeysForDocument(EosDocument doc)
           
protected  DictionaryBasedEntityRecognizer getDictionaryBasedEntityRecognizerForText(CharSequence text)
          Creates a new DictionaryBasedEntityRecognizer for the given text.
protected  ResettableTokenizer getTokenizer()
          Returns a Tokenizer as source for the recognizer.
 Trie<CharSequence,Set<CharSequence>> getTrie()
           
 void setTrie(Trie<CharSequence,Set<CharSequence>> trie)
           
 
Methods inherited from class net.sf.eos.config.Configured
configure, getConfiguration
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DictionaryBasedEntityIdKeyGenerator

public DictionaryBasedEntityIdKeyGenerator()
Method Detail

createKeysForDocument

public Map<Text,EosDocument> createKeysForDocument(EosDocument doc)
                                            throws EosException
Throws:
EosException

getDictionaryBasedEntityRecognizerForText

protected DictionaryBasedEntityRecognizer getDictionaryBasedEntityRecognizerForText(CharSequence text)
Creates a new DictionaryBasedEntityRecognizer for the given text. Uses the factory method of AbstractDictionaryBasedEntityRecognizer.newInstance(net.sf.eos.analyzer.Tokenizer, Configuration) to create a new instance. Use getTokenizer() for the source.

Parameters:
text - the text to tokenize
Returns:
a new instance

getTokenizer

protected ResettableTokenizer getTokenizer()
                                    throws TokenizerException
Returns a Tokenizer as source for the recognizer.

Returns:
the source for the recognizer
Throws:
TokenizerException - if an error occurs

getTrie

public Trie<CharSequence,Set<CharSequence>> getTrie()

setTrie

public void setTrie(Trie<CharSequence,Set<CharSequence>> trie)


Copyright © 2008. All Rights Reserved.