#include <FileClassEnvironmentFactory.hpp>
Public Attributes | |
| std::string | name |
| name of this file class, eg trecweb | |
| std::string | parser |
| document parser for this file class | |
| std::string | tokenizer |
| document tokenizer for this file class | |
| std::string | iterator |
| document iterator for this file class | |
| std::string | startDocTag |
| tag indicating start of a document | |
| std::string | endDocTag |
| tag indicating the end of a document | |
| std::string | endMetadataTag |
| tag indicating the end of the metadata fields | |
| std::vector< std::string > | include |
| tags whose contents should be included in the index. If empty, all tags are included. | |
| std::vector< std::string > | exclude |
| tags whose contents should be excluded from the index | |
| std::vector< std::string > | index |
| tags that should be forwarded to the index for tag extents, ie named fields. | |
| std::vector< std::string > | metadata |
| tags whose contents should be indexed as metadata | |
| std::map< indri::parse::ConflationPattern *, std::string > | conflations |
| tags that should be conflated. The map is the of the form tag => conflated tag, eg h1 => heading. | |
|
|
tags that should be conflated. The map is the of the form tag => conflated tag, eg h1 => heading.
|
|
|
tag indicating the end of a document
|
|
|
tag indicating the end of the metadata fields
|
|
|
tags whose contents should be excluded from the index
|
|
|
tags whose contents should be included in the index. If empty, all tags are included.
|
|
|
tags that should be forwarded to the index for tag extents, ie named fields.
|
|
|
document iterator for this file class
|
|
|
tags whose contents should be indexed as metadata
|
|
|
name of this file class, eg trecweb
|
|
|
document parser for this file class
|
|
|
tag indicating start of a document
|
|
|
document tokenizer for this file class
|
1.3.4