Usage

Describes the usage to call the project converter.

Medline

To convert Medline Citation Set documents into EosDocuments, execute:

hadoop jar net.sf.eos-toolkit.contrib.converter-<VERSION>-uberjar-executable.jar \
       -inputreader "org.apache.hadoop.streaming.StreamXmlRecordReader,maxrec=1000000,begin=<MedlineCitation ,end=</MedlineCitation>" \
       -mapper net.sf.eos.contrib.converter.medline.MedlineMapper \
       -reducer net.sf.eos.contrib.converter.medline.MedlineReducer \
       -input <SOURCE FOLDER> \
       -output <DESTINATION FOLDER>
VERSION
The version of the converter implementation.
SOURCE FOLDER
The source folder containing the Medline Citation Set documents.
DESTINATION FOLDER
The destination folder to store the converted EosDocuments in.

Note:

The MedlineCitation DTD requires at minimum one attribute in the <MedlineCitation> element. This is the main solution to differ the starting character sequence of the element from the parent element <MedlineCitationSet> . Remember the XML specification: the delimiting character between the tag name and the first attribut is a sequence of whitespace characters (S ::= (#x20 | #x9 | #xD | #xA ). The implementation assumes from practice, that the first whitespace character has the value of #x20 . If this assumption creates problems in the future, another solution may be bypass the problem.