Indri
Indri is a search engine that provides state-of-the-art text search and a rich structured query language for text collections of up to 50 million documents (single machine) or 500 million documents (distributed search). Available for Linux, Solaris, Windows and Mac OSX.
No further development is being done with Indri. Please check out our latest project,
Lucindri, which is the Indri search logic built on the Lucene search engine.
Lucindri Home Page
Lucindri Source Code on Github
Features
Powerful Query Interface
- Supports popular structured query operators from INQUERY
- Suffix-based wildcard term matching
- Field retrieval
- Passage retrieval
Flexible Indexing and Document Support
- Supports UTF-8 encoded text
- Language independent tokenization of UTF-8 encoded documents.
- Parses PDF, HTML, XML, and TREC documents
- Word and PowerPoint parsing (Windows only)
- Text Annotations
- Document Metadata
Package Versatility
- Open source, with a flexible BSD-inspired license
- Includes both command line tools and a Java user interface
- API can be used from Java, PHP, or C++
- Works on Windows, Linux, Solaris and Mac OS X
Scalability and Efficiency
- Best-in-class ad hoc retrieval performance
- Can be used on a cluster of machines for faster indexing and retrieval
- Scales to terabyte-sized collections