net.sf.eos.hadoop.mapred.index
Class LuceneOutputFormat<K extends WritableComparable,V extends ObjectWritable>

java.lang.Object
  extended by org.apache.hadoop.mapred.OutputFormatBase<K,V>
      extended by net.sf.eos.hadoop.mapred.index.LuceneOutputFormat<K,V>
All Implemented Interfaces:
OutputFormat<K,V>

public class LuceneOutputFormat<K extends WritableComparable,V extends ObjectWritable>
extends OutputFormatBase<K,V>

Support to write a Lucene index in a Hadoop filesystem.

Parts are copied from Nutch source code.

Author:
Nutch Team, Sascha Kohlmann

Field Summary
static String DONE_NAME
           
static String MAX_BUFFERED_DOCS_CONFIG_NAME
          The name of the max buffered docs value.
static String MAX_FIELD_LENGTH_CONFIG_NAME
          The maximum field length.
static String MAX_MERGE_DOCS_CONFIG_NAME
          The name of the max merge docs value.
static String MERGE_FACTOR_CONFIG_NAME
          The name of the merge factory value.
static String RAM_BUFFER_SIZE_MB_CONFIG_NAME
          The RAM buffer size in MB.
 
Constructor Summary
LuceneOutputFormat()
           
 
Method Summary
 RecordWriter<K,V> getRecordWriter(FileSystem fileSystem, JobConf job, String name, Progressable progress)
          To configure see XXX_CONFIG_NAME keys.
 
Methods inherited from class org.apache.hadoop.mapred.OutputFormatBase
checkOutputSpecs, getCompressOutput, getOutputCompressorClass, setCompressOutput, setOutputCompressorClass
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

MERGE_FACTOR_CONFIG_NAME

@ConfigurationKey(type=INTEGER,
                  defaultValue="10",
                  description="The merge factory value.")
public static final String MERGE_FACTOR_CONFIG_NAME
The name of the merge factory value. Default value is 10.

See Also:
Constant Field Values

MAX_BUFFERED_DOCS_CONFIG_NAME

@ConfigurationKey(type=INTEGER,
                  defaultValue="10",
                  description="The max buffered docs value.")
public static final String MAX_BUFFERED_DOCS_CONFIG_NAME
The name of the max buffered docs value. Default value is 10.

See Also:
Constant Field Values

MAX_MERGE_DOCS_CONFIG_NAME

@ConfigurationKey(type=INTEGER,
                  defaultValue="2147483647",
                  description="The max merge docs value.")
public static final String MAX_MERGE_DOCS_CONFIG_NAME
The name of the max merge docs value. Default value is Integer.MAX_VALUE.

See Also:
Constant Field Values

RAM_BUFFER_SIZE_MB_CONFIG_NAME

@ConfigurationKey(type=INTEGER,
                  defaultValue="200",
                  description="The RAM buffer size in MB.")
public static final String RAM_BUFFER_SIZE_MB_CONFIG_NAME
The RAM buffer size in MB. Default value is 200.

See Also:
Constant Field Values

MAX_FIELD_LENGTH_CONFIG_NAME

@ConfigurationKey(type=INTEGER,
                  defaultValue="100000",
                  description="The maximum field length.")
public static final String MAX_FIELD_LENGTH_CONFIG_NAME
The maximum field length. Default value is 100000.

See Also:
Constant Field Values

DONE_NAME

public static final String DONE_NAME
See Also:
Constant Field Values
Constructor Detail

LuceneOutputFormat

public LuceneOutputFormat()
Method Detail

getRecordWriter

public RecordWriter<K,V> getRecordWriter(FileSystem fileSystem,
                                         JobConf job,
                                         String name,
                                         Progressable progress)
                                                                                    throws IOException
To configure see XXX_CONFIG_NAME keys. Uses internally the instances of AnalyzerSupplier and SimilaritySupplier.

Specified by:
getRecordWriter in interface OutputFormat<K extends WritableComparable,V extends ObjectWritable>
Specified by:
getRecordWriter in class OutputFormatBase<K extends WritableComparable,V extends ObjectWritable>
Throws:
IOException


Copyright © 2008. All Rights Reserved.