net.sf.eos.trie (net.sf.eos-toolkit.core 0.1.0-SNAPSHOT API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV PACKAGE NEXT PACKAGE

FRAMES NO FRAMES All Classes

Package net.sf.eos.trie

Contains the base structure for memory based entity recognition.

See:
Description

Interface Summary
PatriciaTrie.KeyAnalyzer<K>	Defines the interface to analyze `Trie` keys on a bit level.
Trie<K,V>	Defines the interface for a prefix tree, an ordered tree data structure.
Trie.Cursor<K,V>	An interface used by a `Trie`.
TrieLoader<K,V>	Implementations creates new tries.
TrieSource
TrieSource.TrieEntryListener

Class Summary
AbstractTrieLoader<K,V>
ByteArrayKeyAnalyzer
CharSequenceKeyAnalyzer	Analyzes `CharSequence` keys with case sensitivity.
EmptyIterator	Provides an unmodifiable empty iterator.
PatriciaTrie<K,V>	A PATRICIA Trie.
TrieHandler
TrieSource.TrieEntry	Represents an entry in the Trie.
TrieSource.TrieEntryEvent
TrieUtils	Miscellaneous utilities for Tries.
UnmodifiableIterator<E>	A convenience class to aid in developing iterators that cannot be modified.
XmlTrieLoader	The builder creates a trie from a simple XML file.

Enum Summary
Trie.Cursor.SelectStatus	The mode during selection.

Package net.sf.eos.trie Description

Contains the base structure for memory based entity recognition. The trie based on an PATRICIA implementation of the Limewire project. The implementation comes under the terms of version 3 of the GNU General Public License (GPL).

The main benefit for a memory based implementation for entity recognition ist the cluster structure of the Hadoop system. In such a system it is contra productive to have a central instance for entity recognition. Such a central system is always the bottleneck if it is under fire of a few hundrets of cluster node, each with X running instances. A PATRICIA trie structure consumes not as much main memory as other implementations.

To work with the trie in a cluster environment, use the service offered by AbstractTrieLoader. The default serialization format is defined in XmlTrieLoader. At this time the tries key structure is based on CharSequences. This implementation is not as memory optimized as the byte array implementation. The byte array oriented key analyzer may use CharSequences transformed in UTF-8 bytes. This safes memory for latin based languages.

For Hadoop use the distributed cache mechanism of Hadoop. See net.sf.eos.hadoop for further information.

Since:: 0.1.0
Author:: Sascha Kohlmann
See Also:: net.sf.eos.hadoop, net.sf.eos.entity, net.sf.eos.hadoop.mapred.cooccurrence