Sifaka Text Mining - Application Overview

Open Indexes

Left Navigation

The left navigation bar show the indexes that have been opened and any saved searches. To combine saved search sets, right click on the index and select the sets to be combined.

Content Tabs

Click on Index in the Left Nav to run experiments with the index
Properties
This tab contains information on the index.
Search
The corpus can be searched on this screen. Save the resulting documents by pressing "Save Search Results". The resulting set will appear under the index in the Left Nav. To run experiments on that set o\ f documents, click on it in the Left Nav.

Sets can be combined by right clicking on the index in the Left Nav and selecting the sets to combine.
Frequency
The Frequency tab is for finding the most frequent entities in the corpus. After finding the top entities, each result can be searched by right clicking on that result. The list of entites can be saved t\ o a csv file.
Note: No experiment should take more than a few minutes. Check that the right version of Java is installed if any experiment is taking more than 5 minutes.
Co-occurance
The Co-occurance tab is for finding entities that co-occur together. Right click on the result to search for that entity. Results can be saved to csv.
Note: No experiment should take more than a few minutes. Check that the right version of Java is installed if any experiment is taking more than 5 minutes.
Feature Vectors
Note: Feature Vectors are only available for indexes/datasets with labels for classification. This tab will not appear for indexes/datasets that have not been labeled.
The Feature Vectors tab can be used on indexes that have labeled classes. Select the types of features to include and how many times each feature must occur to be included in the results. To use these fe\ atures in Weka, enter the number of top features to use for each category and click "Save Results". This will create an ARFF file which can be imported into Weka.