|
||||||||||
| PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES | |||||||||
See:
Description
| Interface Summary | |
|---|---|
| PatriciaTrie.KeyAnalyzer<K> | Defines the interface to analyze Trie keys on a bit
level. |
| Trie<K,V> | Defines the interface for a prefix tree, an ordered tree data structure. |
| Trie.Cursor<K,V> | An interface used by a Trie. |
| TrieLoader<K,V> | Implementations creates new tries. |
| TrieSource | |
| TrieSource.TrieEntryListener | |
| Class Summary | |
|---|---|
| AbstractTrieLoader<K,V> | |
| ByteArrayKeyAnalyzer | |
| CharSequenceKeyAnalyzer | Analyzes CharSequence keys with case sensitivity. |
| EmptyIterator | Provides an unmodifiable empty iterator. |
| PatriciaTrie<K,V> | A PATRICIA Trie. |
| TrieHandler | |
| TrieSource.TrieEntry | Represents an entry in the Trie. |
| TrieSource.TrieEntryEvent | |
| TrieUtils | Miscellaneous utilities for Tries. |
| UnmodifiableIterator<E> | A convenience class to aid in developing iterators that cannot be modified. |
| XmlTrieLoader | The builder creates a trie from a simple XML file. |
| Enum Summary | |
|---|---|
| Trie.Cursor.SelectStatus | The mode during selection. |
Contains the base structure for memory based entity recognition. The trie based on an PATRICIA implementation of the Limewire project. The implementation comes under the terms of version 3 of the GNU General Public License (GPL).
The main benefit for a memory based implementation for entity recognition ist the cluster structure of the Hadoop system. In such a system it is contra productive to have a central instance for entity recognition. Such a central system is always the bottleneck if it is under fire of a few hundrets of cluster node, each with X running instances. A PATRICIA trie structure consumes not as much main memory as other implementations.
To work with the trie in a cluster environment, use the service offered by
AbstractTrieLoader. The default serialization format
is defined in XmlTrieLoader. At this time the tries
key structure is based on CharSequences.
This implementation is not as memory optimized as the
byte array
implementation. The byte array oriented key analyzer may use
CharSequences transformed in UTF-8 bytes.
This safes memory for latin based languages.
For Hadoop use the distributed cache mechanism of Hadoop. See
net.sf.eos.hadoop for further information.
net.sf.eos.hadoop,
net.sf.eos.entity,
net.sf.eos.hadoop.mapred.cooccurrence
|
||||||||||
| PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES | |||||||||