#include <FreqCounter.hpp>
Inheritance diagram for lemur::distrib::FreqCounter:
Public Member Functions | |
FreqCounter (const lemur::api::Stopper *stopWords=NULL) | |
FreqCounter (const string &filename, const lemur::api::Stopper *stopWords=NULL) | |
~FreqCounter () | |
Delete the freqency counter. | |
void | clear () |
Clear the frequency counter (set all counts to 0). | |
void | output (const string &filename) const |
Output the frequency information to a file. | |
char * | randomWord () |
void | setRandomMode (int mode) |
int | getRandomMode () const |
char * | randomCtf () const |
char * | randomDf () const |
char * | randomAveTf () const |
char * | randomUniform () const |
int | numWords () const |
int | totWords () const |
const freqmap * | getFreqInfo () const |
int | getCtf (const char *word) const |
int | getDf (const char *word) const |
double | getAveTf (const char *word) const |
double | ctfRatio (FreqCounter &lm1) const |
char * | handleDoc (char *docno) |
Overridden from TextHandler. | |
char * | handleWord (char *word) |
Overridden from TextHandler - increments collection term frequencies. | |
void | endDoc () |
Specifies end of a document - updates document frequencies. | |
void | setName (const string &freqCounterName) |
Set the name of language model described by the frequency counter. | |
const string & | getName () const |
Get the counter's name. | |
void | pruneBottomWords (int topWords) |
Prune least frequent words, keeping only topWords most frequent words. | |
Protected Member Functions | |
void | input (const string &filename) |
Protected Attributes | |
freqmap | freqInfo |
stringset | doc |
stringset | randdone |
string | name |
const lemur::api::Stopper * | stopper |
long | ctfTot |
int | dfTot |
long double | avetfTot |
bool | atfValid |
int | randomMode |
int | nWords |
|
Create a frequency counter with the specified stopword list. The stopWords parameter is optional. |
|
Create a frequency counter (loading it from file) with the specified stopword list. Thes stopWords parameter is optional. |
|
Delete the freqency counter.
|
|
Clear the frequency counter (set all counts to 0).
|
|
Compare lm1 to this language model, returning the ctf ratio. |
|
Specifies end of a document - updates document frequencies.
|
|
Get the average term frequency for a word. |
|
Get the collection term frequency for a word. |
|
Get the document frequency for a word. |
|
Get a reference to the internal frequency count map. |
|
Get the counter's name.
|
|
Gets the current random word mode. See setRandomMode(...) |
|
Overridden from TextHandler.
Reimplemented from lemur::api::TextHandler. |
|
Overridden from TextHandler - increments collection term frequencies.
Reimplemented from lemur::api::TextHandler. |
|
|
|
Return the number of unique words seen across all documents processed. |
|
Output the frequency information to a file.
|
|
Prune least frequent words, keeping only topWords most frequent words.
|
|
Select a word at random using average term frequency. This word is no guarenteed to be unique from other calls to this function. |
|
Select a word at random using collection term frequency. This word is not guarenteed to be unique from other calls to this function. |
|
Select a word at random using document frequency. This word is not guarenteed to be unique from other calls to this function. |
|
Select a word at random with equal probability for each word. This word is not guarenteed to be unique from other calls to this funtion. |
|
Get a random word from the distribution specified by setRandomMode. The random word is unique since the last clear operation. |
|
Set the name of language model described by the frequency counter.
|
|
Set the random word selection mode: R_CTF - select using collection term frequency R_DF - select using document frequency R_AVE_TF - select using average term frequency R_UNIFORM - select each word with equal probability |
|
Return the total words seen across all documents processed. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|