Lemur Modules and Applications (Version 4.12)

There are many different types of applications that come bundled with the Lemur Toolkit. The tables below are grouped by function and show the application and the description of each executable.

Application Types:

Parsing and Pre-processing
Building/Adding to an index
General Retrieval and Evaluation
InQuery Structured Query Language
Indri Structured Query Retrieval
Distributed IR and Query-based Sampling
Summarization
Document Clustering
User Interfaces

Application	Description
Parsing and Pre-processing:
ParseToFile	Parses documents compatible with Parser objects and writes output compatible with BasicDocStream
ParseQuery	Takes a document in NIST's Web or Trec formats and creates queries
ParseInQueryOp	Parses a file containing structured queries into BasicDocStream format
Building/Adding to an index:
Application	Description
BuildIndex	Builds an KeyfileInc or Indri index.
BuildDocMgr	Builds a DocumentManager and Index for KeyfileInc indexes. (Indri has its own document manager built in)
BuildPropIndex	Builds a positional index that can associate properties with terms, such as part of speech and named entity tags
IndriBuildIndex	Build an IndriIndex (Indri Repository) using Indri style parameter files and parsing, not using Lemur parameters nor TextHandlers.
General Retrieval and Evaluation:
Application	Description
RetEval	Runs retrieval experiments (with/without feedback) to evaluate different retrieval models, such as simple TFIDF, Okapi, KL-divergence, and Indri SQL.
RelFBEval	Runs retrieval experiments with relevance feedback
QueryModelEval	Loads an expanded query model (e.g., one computed by GenerateQueryModel), and evaluates it with the KL-divergence retrieval model
TwoStageRetEval	Runs retrieval experiments, using the two-stage smoothing method for the initial retrieval and the KL-divergence model for feedback
GenL2Norm	Generates a support file for retrieval using cosine similarity
QueryClarity	Computes clarity scores for a query model
GenerateSmoothSupport	Generates two support files for retrieval using the language modeling approach to speed up the retrieval process
GenerateQueryModel	Computes an expanded query model based on feedback documents and the original query model for the KL-divergence retrieval method
EstimateDirPrior	Uses the leave-one-out method to estimate an optimal setting for the Dirichlet prior smoothing parameter
ireval	A java utility that computes a variety of standard information retrieval metrics commonly used in TREC, including binary preference (BPREF), geometric mean average precision (GMAP), mean average precision (MAP), and standard precision and recall.
InQuery Structured Query Language:
Application	Description
ParseInQueryOp	Parses a file containing InQuery structured queries into BasicDocStream format
StructQueryEval	Runs retrieval experiments to evaluate the performance of the structured query model using the InQuery retrieval method
Indri Structured Query Retrieval:
Application	Description
RetEval	Retrieval evaluation using the IndriRetMethod (using an IndriIndex)
IndriRunQuery	Retrieval evaluation for the Indri structured query language, directly using the Indri Repository API.
Distributed IR and Query-based Sampling:
Application	Description
CollSelIndex	Builds a collection selection database using either document frequency or collection term frequency for the database's term frequency count
DistRetEval	Does distributed retrieval, using a collection selection index and individual indexes
QryBasedSample	Performs query-based sampling on text databases
Summarization:
Application	Description
BasicSummApp	Demonstrates a simple summarizer
MMRSummApp	A more complex summarizer which does comparisons between passages
Document Clustering:
Application	Description
Cluster	Performs the basic online clustering task over documents in an index. Can be used for TDT topic detection.
OfflineCluster	Demonstrates the basic offline clustering task. Provides k-means and bisecting k-means partitional clustering.
PLSA	Perform Probabilistic Latent Semantic Analysis (PLSA) on a collection, building three probability tables.
User Interfaces:
Application	Description
Lemur CGI	Code for using Lemur as a CGI script from a HTTP server
Indexing and Retrieval GUI	GUIs written in java/swing for indexing and searching Lemur indexes