The ClueWeb22 Dataset:
Query Details

ClueWeb22 includes a large set of crowdsourced queries and relevance assessments created by the Lemur Project. These queries are intended to enable support the training and evaluation of ranking algorithms.

This work is in progress. The initial sets of queries and relevance assessments are planned for release in late August 2022.

 

Queries and Relevance Assessments

Queries and relevance assessments were collected from crowd workers. Workers must be from an English-speaking country, have a good worker score, and pass a qualification test in which they correctly select relevant documents and avoid selecting non-relevant documents.

The process for a single query was as follows.

A single worker may contribute no more than ten queries per day.