Main Page | Namespace List | Class Hierarchy | Class List | File List | Namespace Members | Class Members | File Members | Related Pages

lemur::parse::BrillPOSTokenizer Class Reference

#include <BrillPOSTokenizer.hpp>

Inheritance diagram for lemur::parse::BrillPOSTokenizer:

lemur::api::TextHandler List of all members.

Public Member Functions

 BrillPOSTokenizer ()
 make a new POSTokenizer with default split character "/"

 BrillPOSTokenizer (char s)
 make a new POSTokenizer with a different splitting character

void setDelimiter (char s)
 set a new delimiter character to split into tokens

char * handleWord (char *word, const char *original, PropertyList *list)

Protected Attributes

char splitter
Property pos

Detailed Description

This TextHandler parses tokens that have been put through Brill's POS tagger. This is usually of the format "word/POS". This TH will split the token at the delimiter, send the word as is along the pipeline with the POS added as a Property. TextHandlers further down the chain can access the POS by getting a Property named "POS" from the PropertyList. Generally, this Parser should be chained after a TextHandler tokenizing parser, such as the WebParser, and before sending to Stopper or Stemmer.


Constructor & Destructor Documentation

lemur::parse::BrillPOSTokenizer::BrillPOSTokenizer  ) 
 

make a new POSTokenizer with default split character "/"

lemur::parse::BrillPOSTokenizer::BrillPOSTokenizer char  s  ) 
 

make a new POSTokenizer with a different splitting character


Member Function Documentation

char * lemur::parse::BrillPOSTokenizer::handleWord char *  word,
const char *  original,
PropertyList list
[virtual]
 

split the token, send the word as is along the pipeline with the POS added as a Property

Reimplemented from lemur::api::TextHandler.

void lemur::parse::BrillPOSTokenizer::setDelimiter char  s  )  [inline]
 

set a new delimiter character to split into tokens


Member Data Documentation

Property lemur::parse::BrillPOSTokenizer::pos [protected]
 

char lemur::parse::BrillPOSTokenizer::splitter [protected]
 


The documentation for this class was generated from the following files:
Generated on Tue Jun 15 11:03:06 2010 for Lemur by doxygen 1.3.4