Create ClueWeb12 B13 Dataset

Download ClueWeb12-CreateB13.tgz (1.3G)

In order to create the ClueWeb12 B13 dataset, you must run this tool on the ClueWeb12 v1.1 dataset.

Extract the content using the following command:

$ tar -zxvf ClueWeb12-CreateB13.tgz
ClueWeb12-CreateB13/
ClueWeb12-CreateB13/README.txt
ClueWeb12-CreateB13/software/
ClueWeb12-CreateB13/software/META-INF/
ClueWeb12-CreateB13/software/META-INF/MANIFEST.MF
ClueWeb12-CreateB13/software/WarcRecord.java
ClueWeb12-CreateB13/software/CreateClueWeb12B13Dataset.java
ClueWeb12-CreateB13/software/Makefile
ClueWeb12-CreateB13/software/CreateClueWeb12B13Dataset.class
ClueWeb12-CreateB13/software/WarcRecord$WarcHeader.class
ClueWeb12-CreateB13/software/WarcRecord.class
ClueWeb12-CreateB13/software/CreateClueWeb12B13Dataset.jar
ClueWeb12-CreateB13/checksums.tgz
ClueWeb12-CreateB13/recordcounts.tgz
ClueWeb12-CreateB13/ClueWeb12_B13_DocID_To_URL.txt.bz2