εοs-toolkit core - Project Site

Introduction

eos-toolkit core is the base implementation of εos. εos stands for entity oriented search. Eos is also the name of the Greek mythology goddess of Aurora (Greek: Ηώς).

εos major task is to identify concordance (index) lists of related named entities from a text corpus. To support this task εos should offer a bunch of tools and concepts to use the whole chain to create different application based on it. Its also a target to offer an out-of-the-box implementation for a common use-case .

Possible applications of the entity oriented search in unstructured text with or without metadata are:

Enrich news search: Based on the concordance in timeline oriented search εos should offer near by named entities. E.g. in April 2008 searching for "Hillary Clinton" may offer you the concordance of the named entities "Barack Obama" and "John McCain".
Explore timeline based named entity occurrence: This may be a use-case for researcher in the biomedical domain. Explore the named entity of "Dopamine" in a timeline based context to "Parkinson's disease". What is an upcoming named entity in your research domain?
Improve lexicon viewing: Offer the user of an encyclopedia entries which are in context of the observed entry.

Based on

εos based on two major open source projects:

Lucene: Lucene is the backbone of the retrieval side. εos heavily based on the tf-idf and the fulltext retrieval functions of Lucene.
Hadoop: Hadoop is the backbone of the analyzing side of εos. Cause it takes long time to create a Lucene index for the retrieval side of εos. Hadoop is a strong opportunity to create such an index in an acceptable time for the online search business.

Next Tasks

Create use-case web-service for Wikipedia based entity oriented search.
Add contribution code to transform Wikipedia into EosDocuemts inside of an Hadoop cluster.
Improve documentation.
Setup development environment to better user support (e.g. Mailing list, Wiki, Issue Tracker)

Reference

εos is inspired by a paper of Mikhail Bautin and Steven Skiena about Entity Oriented Search . εos and the εos-toolkit architecture based on a proof-of-concept implementation by Sascha Kohlmann. Experience in building εos with Hadoop and Lucene are written in the blog of Sascha (German).

Overview

Project Documentation

Introduction

Based on

Next Tasks

Reference