net.sf.eos.entity
Class AbstractDictionaryBasedEntityRecognizer

java.lang.Object
  extended by net.sf.eos.analyzer.TokenFilter
      extended by net.sf.eos.entity.AbstractDictionaryBasedEntityRecognizer
All Implemented Interfaces:
Tokenizer, Configurable, DictionaryBasedEntityRecognizer, EntityRecognizer
Direct Known Subclasses:
SimpleLongestMatchDictionaryBasedEntityRecognizer

public abstract class AbstractDictionaryBasedEntityRecognizer
extends TokenFilter
implements EntityRecognizer, Configurable, DictionaryBasedEntityRecognizer

An implementation of a @code EntityRecognizer} identifies entities in a text. An entity may represented by an ID. The ID is a bracket around a collection of literal entity terms or phrases. The ID is represented by the value of a Map entry. The entity literal is the value of the key in the entry.

Author:
Sascha Kohlmann

Field Summary
static String ABSTRACT_DICTIONARY_BASED_ENTITY_RECOGNIZER_IMPL_CONFIG_NAME
          The configuration key name for the classname of the factory.
static String MAX_TOKEN_CONFIG_NAME
          Key for the maximum token count.
 
Fields inherited from interface net.sf.eos.entity.DictionaryBasedEntityRecognizer
ENTITY_ID_KEY
 
Constructor Summary
AbstractDictionaryBasedEntityRecognizer(Tokenizer source)
           
 
Method Summary
 void configure(Configuration config)
          Set the configuration to be used by this object.
protected  Configuration getConfiguration()
          Returns the configuration.
 Map<CharSequence,Set<CharSequence>> getEntityMap()
          Return the entity map.
 int getMaxToken()
           
 TextBuilder getTextBuilder()
          Returns a setted builder.
static DictionaryBasedEntityRecognizer newInstance(Tokenizer source)
          Creates a new instance of a of the recognizer.
static DictionaryBasedEntityRecognizer newInstance(Tokenizer source, Configuration config)
          Creates a new instance of a of the recognizer.
 void setEntityMap(Map<CharSequence,Set<CharSequence>> entities)
          Set the entity map.
 void setMaxToken(int maxToken)
           
 void setTextBuilder(TextBuilder builder)
          Sets a builder.
 
Methods inherited from class net.sf.eos.analyzer.TokenFilter
getSource, next
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

ABSTRACT_DICTIONARY_BASED_ENTITY_RECOGNIZER_IMPL_CONFIG_NAME

@ConfigurationKey(type=CLASSNAME,
                  description="Implementations of a EntityRecognizer to identify entities in a text.")
public static final String ABSTRACT_DICTIONARY_BASED_ENTITY_RECOGNIZER_IMPL_CONFIG_NAME
The configuration key name for the classname of the factory.

See Also:
newInstance(Tokenizer, Configuration), newInstance(Tokenizer), Constant Field Values

MAX_TOKEN_CONFIG_NAME

@ConfigurationKey(type=INTEGER,
                  defaultValue="5",
                  description="The maximum token count for indentifying.")
public static final String MAX_TOKEN_CONFIG_NAME
Key for the maximum token count.

See Also:
Constant Field Values
Constructor Detail

AbstractDictionaryBasedEntityRecognizer

public AbstractDictionaryBasedEntityRecognizer(Tokenizer source)
Method Detail

setEntityMap

public void setEntityMap(Map<CharSequence,Set<CharSequence>> entities)
Description copied from interface: DictionaryBasedEntityRecognizer
Set the entity map.

Specified by:
setEntityMap in interface DictionaryBasedEntityRecognizer
Parameters:
entities - the entity map
See Also:
Trie

getEntityMap

public Map<CharSequence,Set<CharSequence>> getEntityMap()
Description copied from interface: DictionaryBasedEntityRecognizer
Return the entity map.

Specified by:
getEntityMap in interface DictionaryBasedEntityRecognizer
Returns:
the entity map. May be null

setTextBuilder

public void setTextBuilder(TextBuilder builder)
Description copied from interface: DictionaryBasedEntityRecognizer
Sets a builder. The implementation has default builder of instance TextBuilder.SPACE_BUILDER setted at construction time.

Specified by:
setTextBuilder in interface DictionaryBasedEntityRecognizer
Parameters:
builder - a builder to set or null

getTextBuilder

public TextBuilder getTextBuilder()
Description copied from interface: DictionaryBasedEntityRecognizer
Returns a setted builder.

Specified by:
getTextBuilder in interface DictionaryBasedEntityRecognizer
Returns:
a setted builder or null.

getMaxToken

public int getMaxToken()
Specified by:
getMaxToken in interface DictionaryBasedEntityRecognizer
Returns:
the maxToken

setMaxToken

public void setMaxToken(int maxToken)
Specified by:
setMaxToken in interface DictionaryBasedEntityRecognizer
Parameters:
maxToken - the maxToken to set

configure

public void configure(Configuration config)
Description copied from interface: Configurable
Set the configuration to be used by this object. Implementation may create a copy of the parameter.

Specified by:
configure in interface Configurable
Parameters:
config - the configuration

getConfiguration

protected final Configuration getConfiguration()
Returns the configuration.

Returns:
the configuration holder or null

newInstance

@FactoryMethod(key="net.sf.eos.entity.AbstractDictionaryBasedEntityRecognizer.impl",
               implementation=SimpleLongestMatchDictionaryBasedEntityRecognizer.class)
public static final DictionaryBasedEntityRecognizer newInstance(Tokenizer source)
                                                         throws EosException
Creates a new instance of a of the recognizer. Instantiate the SimpleLongestMatchDictionaryBasedEntityRecognizer.

Parameters:
source - a source tokenizer
Returns:
a new instance
Throws:
EosException - if it is not possible to instantiate an instance

newInstance

@FactoryMethod(key="net.sf.eos.entity.AbstractDictionaryBasedEntityRecognizer.impl",
               implementation=SimpleLongestMatchDictionaryBasedEntityRecognizer.class)
public static final DictionaryBasedEntityRecognizer newInstance(Tokenizer source,
                                                                                                                                                      Configuration config)
                                                         throws EosException
Creates a new instance of a of the recognizer. If the Configuration contains a key ABSTRACT_DICTIONARY_BASED_ENTITY_RECOGNIZER_IMPL_CONFIG_NAME a new instance of the classname in the value will instantiate. The SimpleLongestMatchDictionaryBasedEntityRecognizer will instantiate if there is no value setted.

Parameters:
source - a source tokenizer
config - the configuration
Returns:
a new instance
Throws:
EosException - if it is not possible to instantiate an instance


Copyright © 2008. All Rights Reserved.