#include <DocStream.hpp>
Inheritance diagram for lemur::api::DocStream:
Public Member Functions | |
virtual | ~DocStream () |
virtual void | startDocIteration ()=0 |
start document iteration | |
virtual bool | hasMore ()=0 |
virtual Document * | nextDoc ()=0 |
return a pointer to next document (static memory, do not delete returned instance). hasMore() should be called before calling nextDoc() |
DocStream is an abstract interface for a collection of documents. A given realization can have special tokenization, document header formats, etc, and will return a special Document instance to indicate this.
The following is an example of supporting an index with position information:
An example of supporting index with position information
// a DocStream that handles position class PosDocStream : public DocStream { ... Document *nextDoc() { return (new PosDocument(...)); // returns a special Document } ... };
// a Document that has position information class PosDocument : public Document { ... TokenTerm *nextTerm() { return (new PosTerm(...)); // returns a special Term } };
// a Term that has position class PosTerm: public TokenTerm { int getPosition() { ... } };
// Indexer that records term positions class PosIndex : public Index { ... PosDocStream *db;
... // when indexing
db->startDocIteration(); Document *doc; while (db->hasMore()) { Document *doc = db->nextDoc(); // we'll actually get a PosDocument doc->startTermIteration(); PosTerm *term; while (doc->hasMore()) { term = (PosTerm *)nextTerm(term); // note that down-casting! term->getPosition(); term->spelling(); ...
} } ... }
Constructor & Destructor Documentation
|
|
|
Implemented in lemur::parse::BasicDocStream. |
|
return a pointer to next document (static memory, do not delete returned instance). hasMore() should be called before calling nextDoc()
Implemented in lemur::parse::BasicDocStream. |
|
start document iteration
Implemented in lemur::parse::BasicDocStream. |