#include <Parser.hpp>
Inheritance diagram for lemur::api::Parser:
Public Member Functions | |
Parser () | |
virtual | ~Parser () |
virtual void | parse (const string &filename) |
virtual void | parseFile (const string &filename)=0 |
virtual void | parseBuffer (char *buf, int len)=0 |
Parse a buffer. | |
virtual void | setAcroList (const lemur::utility::WordSet *acronyms) |
virtual void | setAcroList (string filename) |
Set the acronym list from this file. | |
virtual long | fileTell () const =0 |
return the current byte position of the file being parsed | |
virtual long | getDocBytePos () const |
return the byte position at the beginning of the current document | |
virtual const string | getParseFile () const |
return the name of the file being parsed | |
Static Public Attributes | |
const string | category = "parser" |
const string | identifier = "parser" |
Protected Member Functions | |
bool | isAcronym (const char *word) |
void | clearAcros () |
clears internal acronym list | |
Protected Attributes | |
long | docpos |
string | parsefile |
Private Attributes | |
lemur::utility::WordSet * | myacros |
The acronym list. | |
const lemur::utility::WordSet * | borrowedacros |
Assumes that the parser uses an acronym list. If, when developing your parser, you do not use an acronym list, you can just provide an empty implementation of the setAcroList function.
|
|
|
|
|
clears internal acronym list
|
|
return the current byte position of the file being parsed
Implemented in lemur::parse::ArabicParser, lemur::parse::BrillPOSParser, lemur::parse::ChineseCharParser, lemur::parse::ChineseParser, lemur::parse::IdentifinderParser, lemur::parse::InqArabicParser, lemur::parse::InQueryOpParser, lemur::parse::ReutersParser, lemur::parse::TrecParser, and lemur::parse::WebParser. |
|
return the byte position at the beginning of the current document
|
|
return the name of the file being parsed
|
|
Checks to see if the word is in the acronym list. Returns false if the list is not set. |
|
Parse a file. use parseFile. this method will be deprecated in future |
|
|
Parse a file. implementing subclasses should set parsefile string Implemented in lemur::parse::ArabicParser, lemur::parse::BrillPOSParser, lemur::parse::ChineseCharParser, lemur::parse::ChineseParser, lemur::parse::IdentifinderParser, lemur::parse::InqArabicParser, lemur::parse::InQueryOpParser, lemur::parse::ReutersParser, lemur::parse::TrecParser, and lemur::parse::WebParser. |
|
Set the acronym list from this file.
|
|
Set the acronym list. Can be an empty implementation if the parser is not designed to deal with acronyms by using a list. WordSet still belongs to the caller |
|
|
|
Reimplemented from lemur::api::TextHandler. |
|
|
|
|
The acronym list.
|
|
|