This application will either perform PLSA on a collection, building three probability tables: P(z), P(d|z), and P(w|z) where z in Z are the latent variables (categories), d in D are the documents in the collection, and w in W are the terms in the vocabulary over the collection, or open those tables and read them into memory to illustrate their potential use.
The parameter doTrain (true|false)
determines whether the tables are constructed or read. The default value is true
.
The other parameters accepted by PLSA are:
-
index -- the index to use. Default is none.
-
numCats -- the number of latent variables (categories) to use. Default is 20.
-
beta -- The value of beta for Tempered EM (TEM). Default is 1.
-
betaMin -- The minimum value for beta, TEM iterations stop when beta falls below this value. Default is 0.6.
-
eta -- Multiplier to scale beta before beginning a new set of TEM iterations. Must be less than 1. Default is 0.92.
-
annealcue -- Minimum allowed difference between likelihood in consecutive iterations. If the difference is less than this, beta is updated. Default is 0.
-
numIters -- Maximum number of iterations to perform. Default is 100.
-
numRestarts -- Number of times to recompute with different random seeds. Default is 1.
-
testPercentage -- Percentage of events (d,w) to hold out for validation.
-
doTrain -- whether to construct the probability tables or read them in. Default is true.
Generated on Tue Jun 15 11:02:58 2010 for Lemur by
1.3.4