smoothSupportFile
, see below) is needed by retrieval using smoothed unigram language model. Each entry in this support file corresponds to one document and records two pieces of information: (a) the count of unique terms in the document; (b) the sum of collection language model probabilities for the words in the document. The other file (with an extra suffix "<tt>.mc</tt>" is needed if you run feedback based on the Markov chain query model. Each line in this file contains a term and a sum of the probability of the word given all documents in the collection. (i.e., a sum of p(w|d)
over all possible d
's.)To run the application, follow the general steps of running a lemur application and set the following variables in the parameter file:
index
: the table-of-content (TOC) record file of the index.
smoothSupportFile
: file path for the support file (e.g., /usr0/mydata/index.supp
)