#include <TextHandler.hpp>
Inheritance diagram for lemur::api::TextHandler:
Public Types | |
enum | TokenType { BEGINDOC = 1, ENDDOC = 2, WORDTOK = 3, BEGINTAG = 4, ENDTAG = 5, SYMBOLTOK = 6 } |
Public Member Functions | |
TextHandler () | |
virtual | ~TextHandler () |
virtual void | setTextHandler (TextHandler *th) |
Set the TextHandler that this TextHandler will pass information on to. | |
virtual TextHandler * | getTextHandler () |
Get the TextHandler that this TextHandler will pass information on to. | |
virtual TextHandler * | getPrevHandler () |
Get the TextHandler that this TextHandler gets info from. | |
virtual void | foundToken (TokenType type, const char *token=NULL, const char *orig=NULL, lemur::parse::PropertyList *properties=NULL) |
virtual char * | handleBeginDoc (char *docno, const char *original, lemur::parse::PropertyList *list) |
virtual char * | handleEndDoc (char *token, const char *original, lemur::parse::PropertyList *list) |
virtual char * | handleWord (char *word, const char *original, lemur::parse::PropertyList *list) |
virtual char * | handleBeginTag (char *tag, const char *original, lemur::parse::PropertyList *list) |
Handle a begin tag. | |
virtual char * | handleEndTag (char *tag, const char *original, lemur::parse::PropertyList *list) |
Handle an end tag. | |
virtual char * | handleSymbol (char *symbol, const char *original, lemur::parse::PropertyList *list) |
virtual void | foundDoc (char *docno) |
Found a document with document number. | |
virtual void | foundDoc (char *docno, const char *original) |
virtual void | foundWord (char *word) |
Found a word. | |
virtual void | foundWord (char *word, const char *original) |
virtual void | foundEndDoc () |
Found end of doc. | |
virtual void | foundSymbol (const char *sym) |
Found a word. | |
virtual char * | handleDoc (char *docno) |
Handle a doc. | |
virtual char * | handleWord (char *word) |
Handle a word, possibly transforming it. | |
virtual void | handleEndDoc () |
Handle the end of the doc. | |
virtual char * | handleSymbol (char *sym) |
Handle a word, possibly transforming it. | |
virtual string | getCategory () const |
Return the category TextHandler this is. | |
virtual string | getIdentifier () const |
Return a unique identifier for this TextHandler object. | |
virtual void | writePropertyList (lemur::parse::PropertyList *list) const |
Write out the properties associated with this TextHandler into the given list. | |
Static Public Attributes | |
const string | category = "TextHandler" |
const string | identifier = "TextHandler" |
Protected Member Functions | |
virtual void | setPrevHandler (TextHandler *th) |
Set the TextHandler that this TextHandler gets info from. | |
virtual void | destroyPrevHandler () |
the PrevHandler is being destroyed. try to fix the chain | |
virtual void | destroyTextHandler () |
the PrevHandler is being destroyed. try to fix the chain | |
Protected Attributes | |
TextHandler * | textHandler |
The next textHandler in the chain. | |
TextHandler * | prevHandler |
The previous textHandler in the chain. | |
string | cat |
string | iden |
char | buffer [MAXWORDSIZE] |
The setTextHandler function allows chaining of TextHandlers, so that information is passed from one TextHandler to the next. This is useful for chaining things like stopword lists and stemmers.
A source in the chain of TextHandlers does not need to do anything in the foundDoc and foundWord functions. An example of a source is a parser. A destination in the chain of TextHandlers does not need to forward calls or store a when the setTextHandler function is called. An example of a destination would be a class that pushes the words and documents into an InvFPPushIndex (InvFPTextHandler) or writes to file (WriterTextHandler). Classes in the middle of a chain, like Stopper or Stemmer, need to provide full functionality for all functions. When their foundDoc or foundWord is called, they will possibly manipulate the data, then forward the info via calling the foundDoc/foundWord function of their TextHandler. The original should be preserved and passed on as is. Properties can be associated with token using the PropertyList.
TextHandlers have their own internal buffer for modification of the string. The foundWord function copies the word into the buffer then calls handleWord with the copy. The handleWord function may then modify the string and return the pointer to the string. This process is also done for foundDoc/handleDoc.
Might make more sense as TextSource and TextDestination with functions in the middle of the chain inheriting from both.
|
|
|
|
|
|
|
the PrevHandler is being destroyed. try to fix the chain
|
|
the PrevHandler is being destroyed. try to fix the chain
|
|
|
|
Found a document with document number.
|
|
Found end of doc.
|
|
Found a word.
|
|
|
|
|
|
Found a word.
|
|
Return the category TextHandler this is.
|
|
Return a unique identifier for this TextHandler object.
|
|
Get the TextHandler that this TextHandler gets info from.
|
|
Get the TextHandler that this TextHandler will pass information on to.
|
|
Handle a doc begin - default implementation calls handleDoc for backwords compat |
|
Handle a begin tag.
Reimplemented in lemur::parse::IndriTextHandler, and lemur::parse::ElemDocMgr. |
|
|
Handle the end of the doc.
Reimplemented in lemur::distrib::DocFreqIndexer, lemur::parse::IndriTextHandler, and lemur::parse::KeyfileDocMgr. |
|
Handle a doc end - default implementation calls old handleEndDoc for backwords compat |
|
Handle an end tag.
Reimplemented in lemur::parse::IndriTextHandler, and lemur::parse::ElemDocMgr. |
|
Handle a word, possibly transforming it.
Reimplemented in lemur::parse::WriterInQueryHandler, lemur::parse::StringQuery, and lemur::api::QueryDocument. |
|
Handle a symbol - default implementation calls old handleSymbol for backwords compat |
|
|
Handle a word - default implementation calls old handleWord for backwords compat Reimplemented in lemur::parse::IndriTextHandler, lemur::parse::PropIndexTH, and lemur::parse::BrillPOSTokenizer. |
|
Set the TextHandler that this TextHandler gets info from.
|
|
Set the TextHandler that this TextHandler will pass information on to.
|
|
Write out the properties associated with this TextHandler into the given list.
Reimplemented in lemur::parse::ArabicStemmer, and lemur::api::Stopper. |
|
|
|
|
|
Reimplemented in lemur::api::Parser, lemur::api::Stemmer, and lemur::api::Stopper. |
|
|
|
|
The previous textHandler in the chain.
|
|
The next textHandler in the chain.
|