News
Features
The Lemur Toolkit
Indri Search Engine
Lemur Wiki
Download
People
Publications
Discussion
Archived Forums
Tutorials
Sign Up

 
CMU - Language Technologies Institute
Carnegie Mellon University
CIIR, University of Massachusetts Amherst
University of Massachusetts
 

The Lemur Project is sponsored by the Advanced Research and Development Activity in Information Technology (ARDA) under its Statistical Language Modeling for Information Retrieval Research Program and by the National Science Foundation.

INDRI
Language modeling meets inference networks

Indri is a new search engine from the Lemur project; a cooperative effort between the University of Massachusetts and Carnegie Mellon University to build language modeling information retrieval tools.

Effective

  • Best-in-class ad hoc retrieval performance

Flexible

  • Supports popular structured query operators from INQUERY
  • Open source, with a flexible BSD-inspired license
  • Parses PDF, HTML, XML, and TREC documents
  • Word and PowerPoint parsing (Windows only)

Usable

  • Supports UTF-8 encoded text
  • Language independent tokenization of UTF-8 encoded documents.
  • Includes both command line tools and a Java user interface
  • API can be used from Java, PHP, or C++
  • Works on Windows, Linux, Solaris and Mac OS X

Powerful

  • Can be used on a cluster of machines for faster indexing and retrieval
  • Suffix-based wildcard term matching
  • Field retrieval
  • Passage retrieval
  • Scales to terabyte-sized collections

Related Links

 


The Lemur Project The Lemur Project
Last modified:January 12, 2007. 11:44:44 am