Main Page | Namespace List | Class Hierarchy | Class List | File List | Namespace Members | Class Members | File Members | Related Pages

lemur::parse::BrillPOSParser Class Reference

#include <BrillPOSParser.hpp>

Inheritance diagram for lemur::parse::BrillPOSParser:

lemur::api::Parser lemur::api::TextHandler List of all members.

Public Member Functions

 BrillPOSParser ()
void parseFile (const string &filename)
 Parse a file.

void parseBuffer (char *buf, int len)
 Parse a buffer.

long fileTell () const
 return the current byte position of the file being parsed


Static Public Attributes

const string identifier = "brill"

Private Member Functions

void doParse ()
 Actual parsing action flow.


Private Attributes

int state
 The state of the parser.

int poscount
 count position of word in document

Property wordpos
 keep one property and change values

Property tag
LinkedPropertyList proplist
 list


Detailed Description

Parses documents in with similar document separation tags NIST's Web format. <DOC></DOC> around documents and <DOCNO></DOCNO> around docids. recognizes tokens with "/" slashes in them, which is the default separator for Brill's part of speech tagger. Use with BrillPOSTokenizer. This parser also recognizes ./. ?/. and !/. as end of sentence markers and sends along a [eos] token to be indexed. Does case folding for words that are not in the acronym list. Contraction suffixes and possessive suffixes are stripped.

U.S.A., USA's, and USAs are converted to USA. Does not recognize acronyms with numbers.


Constructor & Destructor Documentation

lemur::parse::BrillPOSParser::BrillPOSParser  ) 
 


Member Function Documentation

void lemur::parse::BrillPOSParser::doParse  )  [private]
 

Actual parsing action flow.

long lemur::parse::BrillPOSParser::fileTell  )  const [virtual]
 

return the current byte position of the file being parsed

Implements lemur::api::Parser.

void lemur::parse::BrillPOSParser::parseBuffer char *  buf,
int  len
[virtual]
 

Parse a buffer.

Implements lemur::api::Parser.

void lemur::parse::BrillPOSParser::parseFile const string &  filename  )  [virtual]
 

Parse a file.

Implements lemur::api::Parser.


Member Data Documentation

const string lemur::parse::BrillPOSParser::identifier = "brill" [static]
 

Reimplemented from lemur::api::Parser.

int lemur::parse::BrillPOSParser::poscount [private]
 

count position of word in document

LinkedPropertyList lemur::parse::BrillPOSParser::proplist [private]
 

list

int lemur::parse::BrillPOSParser::state [private]
 

The state of the parser.

Property lemur::parse::BrillPOSParser::tag [private]
 

Property lemur::parse::BrillPOSParser::wordpos [private]
 

keep one property and change values


The documentation for this class was generated from the following files:
Generated on Tue Jun 15 11:03:06 2010 for Lemur by doxygen 1.3.4