Main Page | Namespace List | Class Hierarchy | Class List | File List | Namespace Members | Class Members | File Members | Related Pages

lemur::api::Parser Class Reference

Provides a generic parser interface. More...

#include <Parser.hpp>

Inheritance diagram for lemur::api::Parser:

lemur::api::TextHandler lemur::parse::ArabicParser lemur::parse::BrillPOSParser lemur::parse::ChineseCharParser lemur::parse::ChineseParser lemur::parse::IdentifinderParser lemur::parse::InqArabicParser lemur::parse::InQueryOpParser lemur::parse::ReutersParser lemur::parse::TrecParser lemur::parse::WebParser List of all members.

Public Member Functions

 Parser ()
virtual ~Parser ()
virtual void parse (const string &filename)
virtual void parseFile (const string &filename)=0
virtual void parseBuffer (char *buf, int len)=0
 Parse a buffer.

virtual void setAcroList (const lemur::utility::WordSet *acronyms)
virtual void setAcroList (string filename)
 Set the acronym list from this file.

virtual long fileTell () const =0
 return the current byte position of the file being parsed

virtual long getDocBytePos () const
 return the byte position at the beginning of the current document

virtual const string getParseFile () const
 return the name of the file being parsed


Static Public Attributes

const string category = "parser"
const string identifier = "parser"

Protected Member Functions

bool isAcronym (const char *word)
void clearAcros ()
 clears internal acronym list


Protected Attributes

long docpos
string parsefile

Private Attributes

lemur::utility::WordSetmyacros
 The acronym list.

const lemur::utility::WordSetborrowedacros

Detailed Description

Provides a generic parser interface.

Assumes that the parser uses an acronym list. If, when developing your parser, you do not use an acronym list, you can just provide an empty implementation of the setAcroList function.


Constructor & Destructor Documentation

lemur::api::Parser::Parser  ) 
 

lemur::api::Parser::~Parser  )  [virtual]
 


Member Function Documentation

void lemur::api::Parser::clearAcros  )  [protected]
 

clears internal acronym list

virtual long lemur::api::Parser::fileTell  )  const [pure virtual]
 

return the current byte position of the file being parsed

Implemented in lemur::parse::ArabicParser, lemur::parse::BrillPOSParser, lemur::parse::ChineseCharParser, lemur::parse::ChineseParser, lemur::parse::IdentifinderParser, lemur::parse::InqArabicParser, lemur::parse::InQueryOpParser, lemur::parse::ReutersParser, lemur::parse::TrecParser, and lemur::parse::WebParser.

virtual long lemur::api::Parser::getDocBytePos  )  const [inline, virtual]
 

return the byte position at the beginning of the current document

virtual const string lemur::api::Parser::getParseFile  )  const [inline, virtual]
 

return the name of the file being parsed

bool lemur::api::Parser::isAcronym const char *  word  )  [protected]
 

Checks to see if the word is in the acronym list. Returns false if the list is not set.

void lemur::api::Parser::parse const string &  filename  )  [virtual]
 

Parse a file. use parseFile. this method will be deprecated in future

virtual void lemur::api::Parser::parseBuffer char *  buf,
int  len
[pure virtual]
 

Parse a buffer.

Implemented in lemur::parse::ArabicParser, lemur::parse::BrillPOSParser, lemur::parse::ChineseCharParser, lemur::parse::ChineseParser, lemur::parse::IdentifinderParser, lemur::parse::InqArabicParser, lemur::parse::InQueryOpParser, lemur::parse::ReutersParser, lemur::parse::TrecParser, and lemur::parse::WebParser.

virtual void lemur::api::Parser::parseFile const string &  filename  )  [pure virtual]
 

Parse a file. implementing subclasses should set parsefile string

Implemented in lemur::parse::ArabicParser, lemur::parse::BrillPOSParser, lemur::parse::ChineseCharParser, lemur::parse::ChineseParser, lemur::parse::IdentifinderParser, lemur::parse::InqArabicParser, lemur::parse::InQueryOpParser, lemur::parse::ReutersParser, lemur::parse::TrecParser, and lemur::parse::WebParser.

void lemur::api::Parser::setAcroList string  filename  )  [virtual]
 

Set the acronym list from this file.

void lemur::api::Parser::setAcroList const lemur::utility::WordSet acronyms  )  [virtual]
 

Set the acronym list. Can be an empty implementation if the parser is not designed to deal with acronyms by using a list. WordSet still belongs to the caller


Member Data Documentation

const lemur::utility::WordSet* lemur::api::Parser::borrowedacros [private]
 

const string lemur::api::Parser::category = "parser" [static]
 

Reimplemented from lemur::api::TextHandler.

long lemur::api::Parser::docpos [protected]
 

const string lemur::api::Parser::identifier = "parser" [static]
 

Reimplemented from lemur::api::TextHandler.

Reimplemented in lemur::parse::ArabicParser, lemur::parse::BrillPOSParser, lemur::parse::ChineseCharParser, lemur::parse::ChineseParser, lemur::parse::IdentifinderParser, lemur::parse::InqArabicParser, lemur::parse::InQueryOpParser, lemur::parse::ReutersParser, lemur::parse::TrecParser, and lemur::parse::WebParser.

lemur::utility::WordSet* lemur::api::Parser::myacros [private]
 

The acronym list.

string lemur::api::Parser::parsefile [protected]
 


The documentation for this class was generated from the following files:
Generated on Tue Jun 15 11:03:05 2010 for Lemur by doxygen 1.3.4