Main Page | Namespace List | Class Hierarchy | Class List | File List | Namespace Members | Class Members | File Members | Related Pages

indri::parse::FileClassEnvironmentFactory::Specification Struct Reference

Parsing information for a file class. Used to create a FileClassEnvironment. More...

#include <FileClassEnvironmentFactory.hpp>

List of all members.

Public Attributes

std::string name
 name of this file class, eg trecweb

std::string parser
 document parser for this file class

std::string tokenizer
 document tokenizer for this file class

std::string iterator
 document iterator for this file class

std::string startDocTag
 tag indicating start of a document

std::string endDocTag
 tag indicating the end of a document

std::string endMetadataTag
 tag indicating the end of the metadata fields

std::vector< std::string > include
 tags whose contents should be included in the index. If empty, all tags are included.

std::vector< std::string > exclude
 tags whose contents should be excluded from the index

std::vector< std::string > index
 tags that should be forwarded to the index for tag extents, ie named fields.

std::vector< std::string > metadata
 tags whose contents should be indexed as metadata

std::map< indri::parse::ConflationPattern *,
std::string > 
conflations
 tags that should be conflated. The map is the of the form tag => conflated tag, eg h1 => heading.


Detailed Description

Parsing information for a file class. Used to create a FileClassEnvironment.


Member Data Documentation

std::map<indri::parse::ConflationPattern*,std::string> indri::parse::FileClassEnvironmentFactory::Specification::conflations
 

tags that should be conflated. The map is the of the form tag => conflated tag, eg h1 => heading.

std::string indri::parse::FileClassEnvironmentFactory::Specification::endDocTag
 

tag indicating the end of a document

std::string indri::parse::FileClassEnvironmentFactory::Specification::endMetadataTag
 

tag indicating the end of the metadata fields

std::vector<std::string> indri::parse::FileClassEnvironmentFactory::Specification::exclude
 

tags whose contents should be excluded from the index

std::vector<std::string> indri::parse::FileClassEnvironmentFactory::Specification::include
 

tags whose contents should be included in the index. If empty, all tags are included.

std::vector<std::string> indri::parse::FileClassEnvironmentFactory::Specification::index
 

tags that should be forwarded to the index for tag extents, ie named fields.

std::string indri::parse::FileClassEnvironmentFactory::Specification::iterator
 

document iterator for this file class

std::vector<std::string> indri::parse::FileClassEnvironmentFactory::Specification::metadata
 

tags whose contents should be indexed as metadata

std::string indri::parse::FileClassEnvironmentFactory::Specification::name
 

name of this file class, eg trecweb

std::string indri::parse::FileClassEnvironmentFactory::Specification::parser
 

document parser for this file class

std::string indri::parse::FileClassEnvironmentFactory::Specification::startDocTag
 

tag indicating start of a document

std::string indri::parse::FileClassEnvironmentFactory::Specification::tokenizer
 

document tokenizer for this file class


The documentation for this struct was generated from the following file:
Generated on Tue Jun 15 11:03:03 2010 for Lemur by doxygen 1.3.4