Sifaka
Sifaka is a text mining application built on top of an open-source search engine. Sifaka stores documents using multiple types of text representations (e.g., terms, bigrams, trigrams, noun phrases, named entities) and that may have optional category labels. Sifaka supports typical full-text search capabilities, saved sets, frequency analysis, co-occurrence analysis, and export of feature vectors compatible with Weka.
Features
- Full-text search
- Saved sets of documents
- Frequency analysis
- Co-occurrence analysis
- Export of feature vectors compatible with Weka
Download
Sifaka can be obtained from the SourceForge Lemur Project Page.Release History
The first version of Sifaka was released in December 2016. Release notes for the current release can be found on SourceForge.Tutorial Links
- Quick start
- Using and implementing document parsers
- Opening an index
- Index properties
- Searching the index
- Creating and using saved sets
- Finding the most frequent entities in an index
- Finding entities that co-occur in an index
- Creating feature vectors for classification