net.sf.eos.hadoop.mapred.cooccurrence
Class DictionaryBasedEntityRecognizerMapper
java.lang.Object
org.apache.hadoop.mapred.MapReduceBase
net.sf.eos.hadoop.mapred.EosDocumentSupportMapReduceBase
net.sf.eos.hadoop.mapred.cooccurrence.DictionaryBasedEntityRecognizerMapper
- All Implemented Interfaces:
- Closeable, JobConfigurable, Mapper<LongWritable,Text,Text,Text>
public class DictionaryBasedEntityRecognizerMapper
- extends EosDocumentSupportMapReduceBase
- implements Mapper<LongWritable,Text,Text,Text>
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
DictionaryBasedEntityRecognizerMapper
public DictionaryBasedEntityRecognizerMapper()
map
public void map(LongWritable positionInFile,
Text eosDoc,
OutputCollector<Text,Text> outputCollector,
Reporter reporter)
throws IOException
- Specified by:
map
in interface Mapper<LongWritable,Text,Text,Text>
- Throws:
IOException
configureTrie
protected void configureTrie()
- Configures the trie. After finishing the method
getTrie()
.
Uses the value of DistributedCacheStrategy.STRATEGY_IMPL_CONFIG_NAME
if setted to get the distributed cache strategy.
getTokenizer
protected ResettableTokenizer getTokenizer()
throws TokenizerException
- Returns a
Tokenizer
as source for the
recognizer.
- Returns:
- the source for the recognizer
- Throws:
TokenizerException
- if an error occurs
getTrie
protected Trie<CharSequence,Set<CharSequence>> getTrie()
- Returns a
Trie
instance. See contract in
configureTrie()
- Returns:
- a
Trie
instance
configure
public void configure(JobConf conf)
- Sets the configuration and calls
configureTrie()
- Specified by:
configure
in interface JobConfigurable
- Overrides:
configure
in class EosDocumentSupportMapReduceBase
- Parameters:
conf
- the configuration
close
public void close()
throws IOException
- Specified by:
close
in interface Closeable
- Overrides:
close
in class EosDocumentSupportMapReduceBase
- Throws:
IOException
Copyright © 2008. All Rights Reserved.