This application will either perform PLSA on a collection, building three probability tables: P(z), P(d|z), and P(w|z) where z in Z are the latent variables (categories), d in D are the documents in the collection, and w in W are the terms in the vocabulary over the collection, or open those tables and read them into memory to illustrate their potential use.
The parameter doTrain (true|false) determines whether the tables are constructed or read. The default value is true.
The other parameters accepted by PLSA are: 
- 
index -- the index to use. Default is none. 
 
- 
numCats -- the number of latent variables (categories) to use. Default is 20. 
 
- 
beta -- The value of beta for Tempered EM (TEM). Default is 1. 
 
- 
betaMin -- The minimum value for beta, TEM iterations stop when beta falls below this value. Default is 0.6. 
 
- 
eta -- Multiplier to scale beta before beginning a new set of TEM iterations. Must be less than 1. Default is 0.92. 
 
- 
annealcue -- Minimum allowed difference between likelihood in consecutive iterations. If the difference is less than this, beta is updated. Default is 0. 
 
- 
numIters -- Maximum number of iterations to perform. Default is 100. 
 
- 
numRestarts -- Number of times to recompute with different random seeds. Default is 1. 
 
- 
testPercentage -- Percentage of events (d,w) to hold out for validation. 
 
- 
doTrain -- whether to construct the probability tables or read them in. Default is true. 
 
Generated on Tue Jun 15 11:02:58 2010 for Lemur by
 
1.3.4