net.sf.eos.hadoop.mapred.cooccurrence
Class DictionaryBasedEntityRecognizerMapper

java.lang.Object
  extended by org.apache.hadoop.mapred.MapReduceBase
      extended by net.sf.eos.hadoop.mapred.EosDocumentSupportMapReduceBase
          extended by net.sf.eos.hadoop.mapred.cooccurrence.DictionaryBasedEntityRecognizerMapper
All Implemented Interfaces:
Closeable, JobConfigurable, Mapper<LongWritable,Text,Text,Text>

public class DictionaryBasedEntityRecognizerMapper
extends EosDocumentSupportMapReduceBase
implements Mapper<LongWritable,Text,Text,Text>


Constructor Summary
DictionaryBasedEntityRecognizerMapper()
           
 
Method Summary
 void close()
           
 void configure(JobConf conf)
          Sets the configuration and calls configureTrie()
protected  void configureTrie()
          Configures the trie.
protected  ResettableTokenizer getTokenizer()
          Returns a Tokenizer as source for the recognizer.
protected  Trie<CharSequence,Set<CharSequence>> getTrie()
          Returns a Trie instance.
 void map(LongWritable positionInFile, Text eosDoc, OutputCollector<Text,Text> outputCollector, Reporter reporter)
           
 
Methods inherited from class net.sf.eos.hadoop.mapred.EosDocumentSupportMapReduceBase
eosDocumentToText, getSerializer, textToEosDocument
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DictionaryBasedEntityRecognizerMapper

public DictionaryBasedEntityRecognizerMapper()
Method Detail

map

public void map(LongWritable positionInFile,
                Text eosDoc,
                OutputCollector<Text,Text> outputCollector,
                Reporter reporter)
         throws IOException
Specified by:
map in interface Mapper<LongWritable,Text,Text,Text>
Throws:
IOException

configureTrie

protected void configureTrie()
Configures the trie. After finishing the method getTrie(). Uses the value of DistributedCacheStrategy.STRATEGY_IMPL_CONFIG_NAME if setted to get the distributed cache strategy.


getTokenizer

protected ResettableTokenizer getTokenizer()
                                    throws TokenizerException
Returns a Tokenizer as source for the recognizer.

Returns:
the source for the recognizer
Throws:
TokenizerException - if an error occurs

getTrie

protected Trie<CharSequence,Set<CharSequence>> getTrie()
Returns a Trie instance. See contract in configureTrie()

Returns:
a Trie instance

configure

public void configure(JobConf conf)
Sets the configuration and calls configureTrie()

Specified by:
configure in interface JobConfigurable
Overrides:
configure in class EosDocumentSupportMapReduceBase
Parameters:
conf - the configuration

close

public void close()
           throws IOException
Specified by:
close in interface Closeable
Overrides:
close in class EosDocumentSupportMapReduceBase
Throws:
IOException


Copyright © 2008. All Rights Reserved.