PageRank parameters (pagerank)
- corpus
- The pathname of the file or directory containing documents to index. Specified as <corpus>/path/to/file_or_directory</corpus> in the parameter file and as
-corpus=/path/to/file_or_directory
on the command line.
- links
- The pathname of the directory containing sorted links data for the documents specified in
corpus
produced by the harvestlinks program. Specified as<links>/path/to/links</links> in the parameter file and as -links=/path/to/links
on the command line.
- output
- basename for the output files.
- index
- index to use to get the collection size and internal document ids. Default is none. When none the corpus is scanned to count the number of documents and the string document ids are used.
- docs
- Number of documents to process per iteration. Default 1000. This parameter is ignored if an index parameter is provided, all docs will be used for each iteration.
- iters
- Number of iterations to use estimating the PageRank. Default is 10 if no index parameter is provided, otherwise 100.
- c
- Dampening parameter. Default 0.5 if no index parameter is provided, otherwise 0.85
- writeRaw
- Write the raw PageRank scores to <output>.raw
- writeRanks
- Write the integer PageRank scores [1..10] to <output>.ranks
- writePriors
- Write the log probability PageRank scores to <output>.priors. This data file is suitable for input to the makeprior application.
Generated on Tue Jun 15 11:02:58 2010 for Lemur by
1.3.4