-memory=100M
on the command line. -corpus.path=/path/to/file_or_directory
on the command line. -corpus.class=trecweb
on the command line. The known classes are: path
. Specified as <corpus><annotations>/path/to/file</annotations></corpus> in the parameter file and as -corpus.annotations=/path/to/file
on the command line. path
. Specified as <corpus><metadata>/path/to/file</metadata></corpus> in the parameter file and as -corpus.metadata=/path/to/file
on the command line.
Combining the first two of these elements, the parameter file would contain:
<corpus>
<path>/path/to/file_or_directory</path>
<class>trecweb</class>
</corpus>
field
-- Make the named field available for retrieval as metadata. Specified as <metadata><field>fieldname</field></metadata> in the parameter file and as metadata.field=fieldname
on the command line.
forward
-- Make the named field available for retrieval as metadata and build a lookup table to make retrieving the value more efficient. Specified as <metadata><forward>fieldname</forward></metadata> in the parameter file and as metadata.forward=fieldname
on the command line. The external document id field "docno" is automatically added as a forward metadata field.
backward
-- Make the named field available for retrieval as metadata and build a lookup table for inverse lookup of documents based on the value of the field. Specified as <metadata><backward>fieldname</backward></metadata> in the parameter file and as metadata.backward=fieldname
on the command line. The external document id field "docno" is automatically added as a backward metadata field.
-field.name=fieldname
on the command line. true
if the field contains numeric data, otherwise the symbol false
, specified as <field><numeric>true</numeric></field> in the parameter file and as -field.numeric=true
on the command line. This is an optional parameter, defaulting to false. Note that 0
can be used for false and 1
can be used for true. -stemmer.name=stemmername
on the command line. This is an optional parameter with the default of no stemming. true
to perform case normalization when indexing, false to index with mixed case. Default true
-stopper.word=stopword
on the command line. This is an optional parameter with the default of no stopping.
-index=/path/to/repository
on the command line. This element can be specified multiple times to combine Repositories. -server=hostname
on the command line. The hostname can include an optional port number to connect to, using the form hostname:portnum
. This element can be specified multiple times to combine servers. -count=number
on the command line.
( key ":" value ) [ "," key ":" value ]*
Here's an example rule in command line format:
-rule=method:linear,collectionLambda:0.2,field:title
and in parameter file format:
<rule>method:linear,collectionLambda:0.2,field:title</rule>
This corresponds to Jelinek-Mercer smoothing with background lambda equal to 0.2, only for items in a title field.
If nothing is listed for a key, all values are assumed. So, a rule that does not specify a field matches all fields. This makes -rule=method:linear,collectionLambda:0.2
a valid rule.
Valid keys:
Valid methods:
-stopper.word=stopword
on the command line. This is an optional parameter with the default of no stopping.
Format of the parameter value:
(tfidf|okapi) [ "," key ":" value ]*
Here's an example rule in command line format:
-baseline=tfidf,k1:1.0,b:0.3
and in parameter file format:
<baseline>tfidf,k1:1.0,b:0.3</baseline>
Methods:
Parameters (optional):
Parameters (optional):
-queryOffset=number
on the command line. -runID=someID
on the command line. true
to produce TREC scorable output, otherwise the symbol false
. Specified as <trecFormat>true</trecFormat> in the parameter file and as -trecFormat=true
on the command line. Note that 0
can be used for false, and 1
can be used for true.
-fbDocs=number
on the command line. -fbTerms=number
on the command line. -fbMu=number
on the command line. -fbOrigWeight=number
on the command line.
-memory=100M
on the command line. -index=/path/to/repository
on the command line. -port=number
on the command line.