Main Page | Namespace List | Class Hierarchy | Class List | File List | Namespace Members | Class Members | File Members | Related Pages

lemur::api::TextHandler Class Reference

This class serves as an interface for classes working with the parsers. More...

#include <TextHandler.hpp>

Inheritance diagram for lemur::api::TextHandler:

lemur::api::Parser lemur::api::QueryDocument lemur::api::Stemmer lemur::api::Stopper lemur::distrib::CtfIndexer lemur::distrib::DocFreqIndexer lemur::distrib::FreqCounter lemur::distrib::MemParser lemur::parse::BrillPOSTokenizer lemur::parse::DocOffsetParser lemur::parse::IndriTextHandler lemur::parse::KeyfileDocMgr lemur::parse::KeyfileTextHandler lemur::parse::PropIndexTH lemur::parse::QueryTextHandler lemur::parse::StringQuery lemur::parse::WriterInQueryHandler lemur::parse::WriterTextHandler List of all members.

Public Types

enum  TokenType {
  BEGINDOC = 1, ENDDOC = 2, WORDTOK = 3, BEGINTAG = 4,
  ENDTAG = 5, SYMBOLTOK = 6
}

Public Member Functions

 TextHandler ()
virtual ~TextHandler ()
virtual void setTextHandler (TextHandler *th)
 Set the TextHandler that this TextHandler will pass information on to.

virtual TextHandlergetTextHandler ()
 Get the TextHandler that this TextHandler will pass information on to.

virtual TextHandlergetPrevHandler ()
 Get the TextHandler that this TextHandler gets info from.

virtual void foundToken (TokenType type, const char *token=NULL, const char *orig=NULL, lemur::parse::PropertyList *properties=NULL)
virtual char * handleBeginDoc (char *docno, const char *original, lemur::parse::PropertyList *list)
virtual char * handleEndDoc (char *token, const char *original, lemur::parse::PropertyList *list)
virtual char * handleWord (char *word, const char *original, lemur::parse::PropertyList *list)
virtual char * handleBeginTag (char *tag, const char *original, lemur::parse::PropertyList *list)
 Handle a begin tag.

virtual char * handleEndTag (char *tag, const char *original, lemur::parse::PropertyList *list)
 Handle an end tag.

virtual char * handleSymbol (char *symbol, const char *original, lemur::parse::PropertyList *list)
virtual void foundDoc (char *docno)
 Found a document with document number.

virtual void foundDoc (char *docno, const char *original)
virtual void foundWord (char *word)
 Found a word.

virtual void foundWord (char *word, const char *original)
virtual void foundEndDoc ()
 Found end of doc.

virtual void foundSymbol (const char *sym)
 Found a word.

virtual char * handleDoc (char *docno)
 Handle a doc.

virtual char * handleWord (char *word)
 Handle a word, possibly transforming it.

virtual void handleEndDoc ()
 Handle the end of the doc.

virtual char * handleSymbol (char *sym)
 Handle a word, possibly transforming it.

virtual string getCategory () const
 Return the category TextHandler this is.

virtual string getIdentifier () const
 Return a unique identifier for this TextHandler object.

virtual void writePropertyList (lemur::parse::PropertyList *list) const
 Write out the properties associated with this TextHandler into the given list.


Static Public Attributes

const string category = "TextHandler"
const string identifier = "TextHandler"

Protected Member Functions

virtual void setPrevHandler (TextHandler *th)
 Set the TextHandler that this TextHandler gets info from.

virtual void destroyPrevHandler ()
 the PrevHandler is being destroyed. try to fix the chain

virtual void destroyTextHandler ()
 the PrevHandler is being destroyed. try to fix the chain


Protected Attributes

TextHandlertextHandler
 The next textHandler in the chain.

TextHandlerprevHandler
 The previous textHandler in the chain.

string cat
string iden
char buffer [MAXWORDSIZE]

Detailed Description

This class serves as an interface for classes working with the parsers.

The setTextHandler function allows chaining of TextHandlers, so that information is passed from one TextHandler to the next. This is useful for chaining things like stopword lists and stemmers.

A source in the chain of TextHandlers does not need to do anything in the foundDoc and foundWord functions. An example of a source is a parser. A destination in the chain of TextHandlers does not need to forward calls or store a when the setTextHandler function is called. An example of a destination would be a class that pushes the words and documents into an InvFPPushIndex (InvFPTextHandler) or writes to file (WriterTextHandler). Classes in the middle of a chain, like Stopper or Stemmer, need to provide full functionality for all functions. When their foundDoc or foundWord is called, they will possibly manipulate the data, then forward the info via calling the foundDoc/foundWord function of their TextHandler. The original should be preserved and passed on as is. Properties can be associated with token using the PropertyList.

TextHandlers have their own internal buffer for modification of the string. The foundWord function copies the word into the buffer then calls handleWord with the copy. The handleWord function may then modify the string and return the pointer to the string. This process is also done for foundDoc/handleDoc.

Might make more sense as TextSource and TextDestination with functions in the middle of the chain inheriting from both.


Member Enumeration Documentation

enum lemur::api::TextHandler::TokenType
 

Enumeration values:
BEGINDOC 
ENDDOC 
WORDTOK 
BEGINTAG 
ENDTAG 
SYMBOLTOK 


Constructor & Destructor Documentation

lemur::api::TextHandler::TextHandler  )  [inline]
 

virtual lemur::api::TextHandler::~TextHandler  )  [inline, virtual]
 


Member Function Documentation

virtual void lemur::api::TextHandler::destroyPrevHandler  )  [inline, protected, virtual]
 

the PrevHandler is being destroyed. try to fix the chain

virtual void lemur::api::TextHandler::destroyTextHandler  )  [inline, protected, virtual]
 

the PrevHandler is being destroyed. try to fix the chain

virtual void lemur::api::TextHandler::foundDoc char *  docno,
const char *  original
[inline, virtual]
 

virtual void lemur::api::TextHandler::foundDoc char *  docno  )  [inline, virtual]
 

Found a document with document number.

virtual void lemur::api::TextHandler::foundEndDoc  )  [inline, virtual]
 

Found end of doc.

virtual void lemur::api::TextHandler::foundSymbol const char *  sym  )  [inline, virtual]
 

Found a word.

virtual void lemur::api::TextHandler::foundToken TokenType  type,
const char *  token = NULL,
const char *  orig = NULL,
lemur::parse::PropertyList properties = NULL
[inline, virtual]
 

virtual void lemur::api::TextHandler::foundWord char *  word,
const char *  original
[inline, virtual]
 

virtual void lemur::api::TextHandler::foundWord char *  word  )  [inline, virtual]
 

Found a word.

virtual string lemur::api::TextHandler::getCategory  )  const [inline, virtual]
 

Return the category TextHandler this is.

virtual string lemur::api::TextHandler::getIdentifier  )  const [inline, virtual]
 

Return a unique identifier for this TextHandler object.

virtual TextHandler* lemur::api::TextHandler::getPrevHandler  )  [inline, virtual]
 

Get the TextHandler that this TextHandler gets info from.

virtual TextHandler* lemur::api::TextHandler::getTextHandler  )  [inline, virtual]
 

Get the TextHandler that this TextHandler will pass information on to.

virtual char* lemur::api::TextHandler::handleBeginDoc char *  docno,
const char *  original,
lemur::parse::PropertyList list
[inline, virtual]
 

Handle a doc begin - default implementation calls handleDoc for backwords compat

virtual char* lemur::api::TextHandler::handleBeginTag char *  tag,
const char *  original,
lemur::parse::PropertyList list
[inline, virtual]
 

Handle a begin tag.

Reimplemented in lemur::parse::IndriTextHandler, and lemur::parse::ElemDocMgr.

virtual char* lemur::api::TextHandler::handleDoc char *  docno  )  [inline, virtual]
 

Handle a doc.

Reimplemented in lemur::distrib::DocFreqIndexer, lemur::distrib::FreqCounter, lemur::parse::IndriTextHandler, lemur::parse::KeyfileTextHandler, lemur::parse::PropIndexTH, lemur::parse::KeyfileDocMgr, lemur::parse::WriterInQueryHandler, and lemur::parse::WriterTextHandler.

virtual void lemur::api::TextHandler::handleEndDoc  )  [inline, virtual]
 

Handle the end of the doc.

Reimplemented in lemur::distrib::DocFreqIndexer, lemur::parse::IndriTextHandler, and lemur::parse::KeyfileDocMgr.

virtual char* lemur::api::TextHandler::handleEndDoc char *  token,
const char *  original,
lemur::parse::PropertyList list
[inline, virtual]
 

Handle a doc end - default implementation calls old handleEndDoc for backwords compat

virtual char* lemur::api::TextHandler::handleEndTag char *  tag,
const char *  original,
lemur::parse::PropertyList list
[inline, virtual]
 

Handle an end tag.

Reimplemented in lemur::parse::IndriTextHandler, and lemur::parse::ElemDocMgr.

virtual char* lemur::api::TextHandler::handleSymbol char *  sym  )  [inline, virtual]
 

Handle a word, possibly transforming it.

Reimplemented in lemur::parse::WriterInQueryHandler, lemur::parse::StringQuery, and lemur::api::QueryDocument.

virtual char* lemur::api::TextHandler::handleSymbol char *  symbol,
const char *  original,
lemur::parse::PropertyList list
[inline, virtual]
 

Handle a symbol - default implementation calls old handleSymbol for backwords compat

virtual char* lemur::api::TextHandler::handleWord char *  word  )  [inline, virtual]
 

Handle a word, possibly transforming it.

Reimplemented in lemur::distrib::CtfIndexer, lemur::distrib::DocFreqIndexer, lemur::distrib::FreqCounter, lemur::parse::KeyfileTextHandler, lemur::parse::QueryTextHandler, lemur::parse::KeyfileDocMgr, lemur::api::Stemmer, lemur::api::Stopper, lemur::parse::WriterInQueryHandler, lemur::parse::WriterTextHandler, lemur::parse::StringQuery, lemur::parse::DocOffsetParser, and lemur::api::QueryDocument.

virtual char* lemur::api::TextHandler::handleWord char *  word,
const char *  original,
lemur::parse::PropertyList list
[inline, virtual]
 

Handle a word - default implementation calls old handleWord for backwords compat

Reimplemented in lemur::parse::IndriTextHandler, lemur::parse::PropIndexTH, and lemur::parse::BrillPOSTokenizer.

virtual void lemur::api::TextHandler::setPrevHandler TextHandler th  )  [inline, protected, virtual]
 

Set the TextHandler that this TextHandler gets info from.

virtual void lemur::api::TextHandler::setTextHandler TextHandler th  )  [inline, virtual]
 

Set the TextHandler that this TextHandler will pass information on to.

virtual void lemur::api::TextHandler::writePropertyList lemur::parse::PropertyList list  )  const [inline, virtual]
 

Write out the properties associated with this TextHandler into the given list.

Reimplemented in lemur::parse::ArabicStemmer, and lemur::api::Stopper.


Member Data Documentation

char lemur::api::TextHandler::buffer[MAXWORDSIZE] [protected]
 

string lemur::api::TextHandler::cat [protected]
 

const string lemur::api::TextHandler::category = "TextHandler" [static]
 

Reimplemented in lemur::api::Parser, lemur::api::Stemmer, and lemur::api::Stopper.

string lemur::api::TextHandler::iden [protected]
 

const string lemur::api::TextHandler::identifier = "TextHandler" [static]
 

Reimplemented in lemur::parse::ArabicParser, lemur::parse::ArabicStemmer, lemur::parse::BrillPOSParser, lemur::parse::ChineseCharParser, lemur::parse::ChineseParser, lemur::parse::IdentifinderParser, lemur::parse::InqArabicParser, lemur::parse::InQueryOpParser, lemur::parse::KStemmer, lemur::api::Parser, lemur::parse::PorterStemmer, lemur::parse::ReutersParser, lemur::api::Stemmer, lemur::api::Stopper, lemur::parse::TrecParser, and lemur::parse::WebParser.

TextHandler* lemur::api::TextHandler::prevHandler [protected]
 

The previous textHandler in the chain.

TextHandler* lemur::api::TextHandler::textHandler [protected]
 

The next textHandler in the chain.


The documentation for this class was generated from the following files:
Generated on Tue Jun 15 11:03:05 2010 for Lemur by doxygen 1.3.4