Home
Components
Indri
Lemur
Lucindri
Galago
RankLib
Sifaka
WordEntityDuet
ClueWeb09
ClueWeb12
ClueWeb22
WebAP
nfL6
Support
About
ClueWeb09
How to Get It
Dataset Details
Related Data
Online Services
Indexing with Indri
Wiki & Email
FAQ
Sample Files
Below is a list of sample files taken from the ClueWeb09 dataset. Each file has 100 pages (WARC response records).
ClueWeb09_English_Sample.warc.gz
(348k)
ClueWeb09_Chinese_Sample.warc.gz
(451k)
ClueWeb09_Spanish_Sample.warc.gz
(370k)