This application builds a KeyfileIncIndex for a collection of documents with properties associated with terms. 
To use it, follow the general steps of running a lemur application. 
The parameters are: 
- 
index: name of the index to create (don't include extension)  
- 
memory: memory (in bytes) of KeyfileIncIndex cache (def = 128000000).  
- 
stopwords: name of file containing the stopword list.  
- 
acronyms: name of file containing the acronym list.  
- 
countStopWords: If true, count stopwords in document length.  
- 
docFormat: 
- 
"brill" for documents with Brill's part of speech tags, still needs DOC separators between documents similar to Lemur's WebParser. This is the default. 
 
- 
"identifinder" for documents with Identifinder's named entity tags, still needs DOC separators between documents similar to Lemur's WebParser. 
 
 
- 
stemmer: 
- 
"porter" Porter stemmer. 
 
- 
"krovetz" Krovetz stemmer. 
 
- 
"arabic" arabic stemmer, requires additional parameters 
- 
arabicStemFunc: Which stemming algorithm to apply, one of: 
- 
arabic_stop : arabic_stop 
 
- 
arabic_norm2 : table normalization 
 
- 
arabic_norm2_stop : table normalization with stopping 
 
- 
arabic_light10 : light9 plus ll prefix 
 
- 
arabic_light10_stop : light10 and remove stop words 
 
 
 
 
- 
dataFiles: name of file containing list of datafiles to index.  
Generated on Tue Jun 15 11:02:58 2010 for Lemur by
 
1.3.4