Offline Clustering

This example application demonstrates the basic offline clustering task. Provides k-means and bisecting k-means partitional clustering. It will run each algorithm on the first 100 documents in the index (or all of them if less than 100) and print out the results.

The parameters accepted by OfflineCluster are:

index -- the index to use. Default is none.
clusterType -- Type of cluster to use, either agglomerative or centroid. Centroid is agglomerative using mean which trades memory use for speed of clustering. Default is centroid.
simType -- The similarity metric to use. Default is cosine similarity (COS), which is the only implemented method.
docMode -- The integer encoding of the scoring method to use for the agglomerative cluster type. The default is max (maximum). The choices are:
- max -- Maximum score over documents in a cluster.
- mean -- Mean score over documents in a cluster. This is identical to the centroid cluster type.
- avg -- Average score over documents in a cluster.
- min -- Minimum score over documents in a cluster.
numParts -- Number of partitions to split into. Default is 2
maxIters -- Maximum number of iterations for k-means. Default is 100.
bkIters -- Number of k-means iterations for bisecting k-means. Default is 5.

Generated on Tue Jun 15 11:02:58 2010 for Lemur by

1.3.4