Main Page | Namespace List | Class Hierarchy | Class List | File List | Namespace Members | Class Members | File Members | Related Pages

indri::api::QueryEnvironment Class Reference

Principal class for interacting with Indri indexes during retrieval. Provides the API for opening one or more Repository servers, either local or remote. Provides the API for querying the servers with the Indri query language, and additionally requesting aggregate collection statistics. More...

#include <QueryEnvironment.hpp>

List of all members.

Public Member Functions

 QueryEnvironment ()
 ~QueryEnvironment ()
void setMemory (UINT64 memory)
 Set the amount of memory to use.

void setBaseline (const std::string &baseline)
 Set whether there should be one single background model or context sensitive models.

void setSingleBackgroundModel (bool background)
void setScoringRules (const std::vector< std::string > &rules)
 Set the scoring rules.

void setStopwords (const std::vector< std::string > &stopwords)
 Set the stopword list for query processing.

void addServer (const std::string &hostname)
 Add a remote server.

void addIndex (const std::string &pathname)
 Add a local repository.

void addIndex (class IndexEnvironment &environment)
void close ()
 Close the QueryEnvironment.

void removeServer (const std::string &hostname)
 Remove a remote server.

void removeIndex (const std::string &pathname)
 Remove a local repository.

QueryResults runQuery (QueryRequest &request)
 Run an Indri query language query.

std::vector< indri::api::ScoredExtentResultrunQuery (const std::string &query, int resultsRequested, const std::string &queryType="indri")
 Run an Indri query language query.

std::vector< indri::api::ScoredExtentResultrunQuery (const std::string &query, const std::vector< lemur::api::DOCID_T > &documentSet, int resultsRequested, const std::string &queryType="indri")
 Run an Indri query language query.

QueryAnnotationrunAnnotatedQuery (const std::string &query, int resultsRequested, const std::string &queryType="indri")
 Run an Indri query language query.

QueryAnnotationrunAnnotatedQuery (const std::string &query, const std::vector< lemur::api::DOCID_T > &documentSet, int resultsRequested, const std::string &queryType="indri")
 Run an Indri query language query.

std::vector< indri::api::ParsedDocument * > documents (const std::vector< lemur::api::DOCID_T > &documentIDs)
 Fetch the parsed documents for a given list of document ids. Caller is responsible for deleting the returned elements.

std::vector< indri::api::ParsedDocument * > documents (const std::vector< indri::api::ScoredExtentResult > &results)
 Fetch the parsed documents for a given list of ScoredExtentResults Caller is responsible for deleting the returned elements.

std::vector< std::string > documentMetadata (const std::vector< lemur::api::DOCID_T > &documentIDs, const std::string &attributeName)
 Fetch the named metadata attribute for a list of document ids.

std::vector< std::string > documentMetadata (const std::vector< indri::api::ScoredExtentResult > &documentIDs, const std::string &attributeName)
 Fetch the named metadata attribute for a list of ScoredExtentResults.

std::vector< std::string > pathNames (const std::vector< indri::api::ScoredExtentResult > &results)
 Fetch the XPath names of extents for a list of ScoredExtentResults.

std::vector< indri::api::ParsedDocument * > documentsFromMetadata (const std::string &attributeName, const std::vector< std::string > &attributeValues)
 Fetch all documents with a metadata key that matches attributeName, with a value matching one of the attributeValues.

std::vector< lemur::api::DOCID_TdocumentIDsFromMetadata (const std::string &attributeName, const std::vector< std::string > &attributeValue)
 Return a list of document IDs where the document has a metadata key that matches attributeName, with a value matching one of the attributeValues.

INT64 termCount ()
 Return total number of terms.

INT64 termCount (const std::string &term)
 Return total number of term occurrences.

INT64 stemCount (const std::string &term)
 Return total number of stem occurrences.

INT64 termFieldCount (const std::string &term, const std::string &field)
 Return total number of term occurrences within a field.

INT64 stemFieldCount (const std::string &term, const std::string &field)
 Return total number of stem occurrences within a field.

double expressionCount (const std::string &expression, const std::string &queryType="indri")
 Return the total number of times this expression appears in the collection.

std::vector< ScoredExtentResultexpressionList (const std::string &expression, const std::string &queryType="indri")
 Return all the occurrences of this expression in the collection. Note that the returned vector may be quite large for large collections, and therefore has the very real possibility of exhausting the memory of the machine. Use this method with discretion.

std::vector< std::string > fieldList ()
 Return the list of fields.

INT64 documentCount ()
 Return total number of documents in the collection.

INT64 documentCount (const std::string &term)
 Return total number of documents containing term in the collection.

INT64 documentStemCount (const std::string &stem)
 Return total number of documents containing stem in the collection.

int documentLength (lemur::api::DOCID_T documentID)
 Return the length of a document.

std::vector< DocumentVector * > documentVectors (const std::vector< lemur::api::DOCID_T > &documentIDs)
 Fetch a document vector for a list of documents. Caller responsible for deleting the Vector.

void setMaxWildcardTerms (int maxTerms)
 set maximum number of wildcard terms to expand to.

const std::vector< indri::server::QueryServer * > & getServers () const

Private Member Functions

void _mergeQueryResults (indri::infnet::InferenceNetwork::MAllResults &results, std::vector< indri::server::QueryServerResponse * > &responses)
void _copyStatistics (std::vector< indri::lang::RawScorerNode * > &scorerNodes, indri::infnet::InferenceNetwork::MAllResults &statisticsResults)
std::vector< indri::server::QueryServerResponse * > _runServerQuery (std::vector< indri::lang::Node * > &roots, int resultsRequested)
void _sumServerQuery (indri::infnet::InferenceNetwork::MAllResults &results, std::vector< indri::lang::Node * > &roots, int resultsRequested)
void _mergeServerQuery (indri::infnet::InferenceNetwork::MAllResults &results, std::vector< indri::lang::Node * > &roots, int resultsRequested)
void _annotateQuery (indri::infnet::InferenceNetwork::MAllResults &results, const std::vector< lemur::api::DOCID_T > &documentIDs, std::string &annotatorName, indri::lang::Node *queryRoot)
std::vector< indri::api::ScoredExtentResult_runQuery (indri::infnet::InferenceNetwork::MAllResults &results, const std::string &q, int resultsRequested, const std::vector< lemur::api::DOCID_T > *documentIDs, QueryAnnotation **annotation, const std::string &queryType="indri")
void _scoredQuery (indri::infnet::InferenceNetwork::MAllResults &results, indri::lang::Node *queryRoot, std::string &accumulatorName, int resultsRequested, const std::vector< lemur::api::DOCID_T > *documentSet)
 QueryEnvironment (QueryEnvironment &other)

Private Attributes

std::map< std::string, std::pair<
indri::server::QueryServer *,
indri::net::NetworkStream * > > 
_serverNameMap
std::vector< indri::server::QueryServer * > _servers
std::map< std::string, std::pair<
indri::server::QueryServer *,
indri::collection::Repository * > > 
_repositoryNameMap
std::vector< indri::collection::Repository * > _repositories
std::vector< indri::net::NetworkStream * > _streams
std::vector< indri::net::NetworkMessageStream * > _messageStreams
Parameters _parameters
bool _baseline


Detailed Description

Principal class for interacting with Indri indexes during retrieval. Provides the API for opening one or more Repository servers, either local or remote. Provides the API for querying the servers with the Indri query language, and additionally requesting aggregate collection statistics.


Constructor & Destructor Documentation

indri::api::QueryEnvironment::QueryEnvironment QueryEnvironment other  )  [inline, private]
 

indri::api::QueryEnvironment::QueryEnvironment  ) 
 

indri::api::QueryEnvironment::~QueryEnvironment  ) 
 


Member Function Documentation

void indri::api::QueryEnvironment::_annotateQuery indri::infnet::InferenceNetwork::MAllResults results,
const std::vector< lemur::api::DOCID_T > &  documentIDs,
std::string &  annotatorName,
indri::lang::Node queryRoot
[private]
 

void indri::api::QueryEnvironment::_copyStatistics std::vector< indri::lang::RawScorerNode * > &  scorerNodes,
indri::infnet::InferenceNetwork::MAllResults statisticsResults
[private]
 

void indri::api::QueryEnvironment::_mergeQueryResults indri::infnet::InferenceNetwork::MAllResults results,
std::vector< indri::server::QueryServerResponse * > &  responses
[private]
 

void indri::api::QueryEnvironment::_mergeServerQuery indri::infnet::InferenceNetwork::MAllResults results,
std::vector< indri::lang::Node * > &  roots,
int  resultsRequested
[private]
 

std::vector< indri::api::ScoredExtentResult > indri::api::QueryEnvironment::_runQuery indri::infnet::InferenceNetwork::MAllResults results,
const std::string &  q,
int  resultsRequested,
const std::vector< lemur::api::DOCID_T > *  documentIDs,
QueryAnnotation **  annotation,
const std::string &  queryType = "indri"
[private]
 

std::vector< indri::server::QueryServerResponse * > indri::api::QueryEnvironment::_runServerQuery std::vector< indri::lang::Node * > &  roots,
int  resultsRequested
[private]
 

void indri::api::QueryEnvironment::_scoredQuery indri::infnet::InferenceNetwork::MAllResults results,
indri::lang::Node queryRoot,
std::string &  accumulatorName,
int  resultsRequested,
const std::vector< lemur::api::DOCID_T > *  documentSet
[private]
 

void indri::api::QueryEnvironment::_sumServerQuery indri::infnet::InferenceNetwork::MAllResults results,
std::vector< indri::lang::Node * > &  roots,
int  resultsRequested
[private]
 

void indri::api::QueryEnvironment::addIndex class IndexEnvironment environment  ) 
 

Add an IndexEnvironment object. Unlike the other add calls, this one will not close the index when QueryEnvironment::close is called.

Parameters:
environment an IndexEnvironment instance

void indri::api::QueryEnvironment::addIndex const std::string &  pathname  ) 
 

Add a local repository.

Parameters:
pathname the path to the repository.

void indri::api::QueryEnvironment::addServer const std::string &  hostname  ) 
 

Add a remote server.

Parameters:
hostname the host the server is running on

void indri::api::QueryEnvironment::close  ) 
 

Close the QueryEnvironment.

INT64 indri::api::QueryEnvironment::documentCount const std::string &  term  ) 
 

Return total number of documents containing term in the collection.

Parameters:
term the term to count documents for.
Returns:
total number of documents containing term in the aggregated collection

INT64 indri::api::QueryEnvironment::documentCount  ) 
 

Return total number of documents in the collection.

Returns:
total number of documents in the aggregated collection

std::vector< DOCID_T > indri::api::QueryEnvironment::documentIDsFromMetadata const std::string &  attributeName,
const std::vector< std::string > &  attributeValue
 

Return a list of document IDs where the document has a metadata key that matches attributeName, with a value matching one of the attributeValues.

Parameters:
attributeName the name of the metadata attribute (e.g. 'url' or 'docno')
attributeValue values that the metadata attribute should match
Returns:
a vector of ParsedDocuments that match the given metadata criteria

int indri::api::QueryEnvironment::documentLength lemur::api::DOCID_T  documentID  ) 
 

Return the length of a document.

Parameters:
documentID the document id.
Returns:
length of the document, documentID

std::vector< std::string > indri::api::QueryEnvironment::documentMetadata const std::vector< indri::api::ScoredExtentResult > &  documentIDs,
const std::string &  attributeName
 

Fetch the named metadata attribute for a list of ScoredExtentResults.

Parameters:
documentIDs the list of ScoredExtentResults
attributeName the name of the metadata attribute
Returns:
the vector of string values for that attribute

std::vector< std::string > indri::api::QueryEnvironment::documentMetadata const std::vector< lemur::api::DOCID_T > &  documentIDs,
const std::string &  attributeName
 

Fetch the named metadata attribute for a list of document ids.

Parameters:
documentIDs the list of ids
attributeName the name of the metadata attribute
Returns:
the vector of string values for that attribute

std::vector< indri::api::ParsedDocument * > indri::api::QueryEnvironment::documents const std::vector< indri::api::ScoredExtentResult > &  results  ) 
 

Fetch the parsed documents for a given list of ScoredExtentResults Caller is responsible for deleting the returned elements.

Parameters:
results the list of ScoredExtentResults
Returns:
the vector of ParsedDocument pointers.

std::vector< indri::api::ParsedDocument * > indri::api::QueryEnvironment::documents const std::vector< lemur::api::DOCID_T > &  documentIDs  ) 
 

Fetch the parsed documents for a given list of document ids. Caller is responsible for deleting the returned elements.

Parameters:
documentIDs the list of ids
Returns:
the vector of ParsedDocument pointers.

std::vector< indri::api::ParsedDocument * > indri::api::QueryEnvironment::documentsFromMetadata const std::string &  attributeName,
const std::vector< std::string > &  attributeValues
 

Fetch all documents with a metadata key that matches attributeName, with a value matching one of the attributeValues.

Parameters:
attributeName the name of the metadata attribute (e.g. 'url' or 'docno')
attributeValues values that the metadata attribute should match
Returns:
a vector of ParsedDocuments that match the given metadata criteria

INT64 indri::api::QueryEnvironment::documentStemCount const std::string &  stem  ) 
 

Return total number of documents containing stem in the collection.

Parameters:
stem the prestemmed term to count documents for.
Returns:
total number of documents containing stem in the aggregated collection

std::vector< indri::api::DocumentVector * > indri::api::QueryEnvironment::documentVectors const std::vector< lemur::api::DOCID_T > &  documentIDs  ) 
 

Fetch a document vector for a list of documents. Caller responsible for deleting the Vector.

Parameters:
documentIDs the vector of document ids.
Returns:
DocumentVector pointer for the specified document.

double indri::api::QueryEnvironment::expressionCount const std::string &  expression,
const std::string &  queryType = "indri"
 

Return the total number of times this expression appears in the collection.

Parameters:
expression The expression to evaluate, probably an ordered or unordered window expression

std::vector< indri::api::ScoredExtentResult > indri::api::QueryEnvironment::expressionList const std::string &  expression,
const std::string &  queryType = "indri"
 

Return all the occurrences of this expression in the collection. Note that the returned vector may be quite large for large collections, and therefore has the very real possibility of exhausting the memory of the machine. Use this method with discretion.

Parameters:
expression The expression to evaluate, probably an ordered or unordered window expression

std::vector< std::string > indri::api::QueryEnvironment::fieldList  ) 
 

Return the list of fields.

Returns:
vector of field names.

const std::vector<indri::server::QueryServer*>& indri::api::QueryEnvironment::getServers  )  const [inline]
 

std::vector< std::string > indri::api::QueryEnvironment::pathNames const std::vector< indri::api::ScoredExtentResult > &  results  ) 
 

Fetch the XPath names of extents for a list of ScoredExtentResults.

Parameters:
results the list of ScoredExtentResults
Returns:
the vector of string XPath names for the extents

void indri::api::QueryEnvironment::removeIndex const std::string &  pathname  ) 
 

Remove a local repository.

Parameters:
pathname the path to the repository.

void indri::api::QueryEnvironment::removeServer const std::string &  hostname  ) 
 

Remove a remote server.

Parameters:
hostname the host the server is running on

indri::api::QueryAnnotation * indri::api::QueryEnvironment::runAnnotatedQuery const std::string &  query,
const std::vector< lemur::api::DOCID_T > &  documentSet,
int  resultsRequested,
const std::string &  queryType = "indri"
 

Run an Indri query language query.

See also:
QueryAnnotation
Parameters:
query the query to run
documentSet the working set of document ids to evaluate
resultsRequested maximum number of results to return
Returns:
pointer to QueryAnnotations for the query

indri::api::QueryAnnotation * indri::api::QueryEnvironment::runAnnotatedQuery const std::string &  query,
int  resultsRequested,
const std::string &  queryType = "indri"
 

Run an Indri query language query.

See also:
QueryAnnotation
Parameters:
query the query to run
resultsRequested maximum number of results to return
Returns:
pointer to QueryAnnotations for the query

std::vector< indri::api::ScoredExtentResult > indri::api::QueryEnvironment::runQuery const std::string &  query,
const std::vector< lemur::api::DOCID_T > &  documentSet,
int  resultsRequested,
const std::string &  queryType = "indri"
 

Run an Indri query language query.

See also:
ScoredExtentResult
Parameters:
query the query to run
documentSet the working set of document ids to evaluate
resultsRequested maximum number of results to return
Returns:
the vector of ScoredExtentResults for the query

std::vector< indri::api::ScoredExtentResult > indri::api::QueryEnvironment::runQuery const std::string &  query,
int  resultsRequested,
const std::string &  queryType = "indri"
 

Run an Indri query language query.

See also:
ScoredExtentResult
Parameters:
query the query to run
resultsRequested maximum number of results to return
Returns:
the vector of ScoredExtentResults for the query

indri::api::QueryResults indri::api::QueryEnvironment::runQuery QueryRequest request  ) 
 

Run an Indri query language query.

Parameters:
request the query to run
Returns:
the QueryResults for the request,

void indri::api::QueryEnvironment::setBaseline const std::string &  baseline  ) 
 

Set whether there should be one single background model or context sensitive models.

Parameters:
background true for one background model false for context sensitive models

void indri::api::QueryEnvironment::setMaxWildcardTerms int  maxTerms  ) 
 

set maximum number of wildcard terms to expand to.

Parameters:
maxTerms the maximum number of terms to expand a wildcard operator argument (default 100).

void indri::api::QueryEnvironment::setMemory UINT64  memory  ) 
 

Set the amount of memory to use.

Parameters:
memory number of bytes to allocate

void indri::api::QueryEnvironment::setScoringRules const std::vector< std::string > &  rules  ) 
 

Set the scoring rules.

Parameters:
rules the vector of scoring rules.

void indri::api::QueryEnvironment::setSingleBackgroundModel bool  background  ) 
 

void indri::api::QueryEnvironment::setStopwords const std::vector< std::string > &  stopwords  ) 
 

Set the stopword list for query processing.

Parameters:
stopwords the list of stopwords

INT64 indri::api::QueryEnvironment::stemCount const std::string &  term  ) 
 

Return total number of stem occurrences.

Parameters:
term the stem to count
Returns:
total frequency of this stem in the aggregated collection

INT64 indri::api::QueryEnvironment::stemFieldCount const std::string &  term,
const std::string &  field
 

Return total number of stem occurrences within a field.

Parameters:
term the stem to count
field the name of the field
Returns:
total frequency of this stem within this field in the aggregated collection

INT64 indri::api::QueryEnvironment::termCount const std::string &  term  ) 
 

Return total number of term occurrences.

Parameters:
term the term to count
Returns:
total frequency of this term in the aggregated collection

INT64 indri::api::QueryEnvironment::termCount  ) 
 

Return total number of terms.

Returns:
total number of terms in the aggregated collection

INT64 indri::api::QueryEnvironment::termFieldCount const std::string &  term,
const std::string &  field
 

Return total number of term occurrences within a field.

Parameters:
term the term to count
field the name of the field
Returns:
total frequency of this term within this field in the aggregated collection


Member Data Documentation

bool indri::api::QueryEnvironment::_baseline [private]
 

std::vector<indri::net::NetworkMessageStream*> indri::api::QueryEnvironment::_messageStreams [private]
 

Parameters indri::api::QueryEnvironment::_parameters [private]
 

std::vector<indri::collection::Repository*> indri::api::QueryEnvironment::_repositories [private]
 

std::map<std::string, std::pair<indri::server::QueryServer *, indri::collection::Repository *> > indri::api::QueryEnvironment::_repositoryNameMap [private]
 

std::map<std::string, std::pair<indri::server::QueryServer *, indri::net::NetworkStream *> > indri::api::QueryEnvironment::_serverNameMap [private]
 

std::vector<indri::server::QueryServer*> indri::api::QueryEnvironment::_servers [private]
 

std::vector<indri::net::NetworkStream*> indri::api::QueryEnvironment::_streams [private]
 


The documentation for this class was generated from the following files:
Generated on Tue Jun 15 11:02:59 2010 for Lemur by doxygen 1.3.4