The ClueWeb22 Dataset:
Version 02.02 Corrections
Use the files below to upgrade a version 02.01 dataset to version 02.02. Each file is about 4.2 GB.
ClueWeb22-A: Replace the files in ClueWeb22-A/ClueWeb22-ID_URL-Hash_maps
- ClueWeb22-ID_URL-Hash_map_00.csv.gz
- ClueWeb22-ID_URL-Hash_map_01.csv.gz
- ClueWeb22-ID_URL-Hash_map_02.csv.gz
- ClueWeb22-ID_URL-Hash_map_03.csv.gz
- ClueWeb22-ID_URL-Hash_map_04.csv.gz
- ClueWeb22-ID_URL-Hash_map_05.csv.gz
- ClueWeb22-ID_URL-Hash_map_06.csv.gz
- ClueWeb22-ID_URL-Hash_map_07.csv.gz
- ClueWeb22-ID_URL-Hash_map_08.csv.gz
- ClueWeb22-ID_URL-Hash_map_09.csv.gz
ClueWeb22-L: Replace the files in ClueWeb22-L/ClueWeb22-ID_URL-Hash_maps
- ClueWeb22-ID_URL-Hash_map_00.csv.gz
- ClueWeb22-ID_URL-Hash_map_01.csv.gz
- ClueWeb22-ID_URL-Hash_map_02.csv.gz
- ClueWeb22-ID_URL-Hash_map_03.csv.gz
- ClueWeb22-ID_URL-Hash_map_04.csv.gz
- ClueWeb22-ID_URL-Hash_map_05.csv.gz
- ClueWeb22-ID_URL-Hash_map_06.csv.gz
- ClueWeb22-ID_URL-Hash_map_07.csv.gz
- ClueWeb22-ID_URL-Hash_map_08.csv.gz
- ClueWeb22-ID_URL-Hash_map_09.csv.gz
- ClueWeb22-ID_URL-Hash_map_10.csv.gz
- ClueWeb22-ID_URL-Hash_map_11.csv.gz
- ClueWeb22-ID_URL-Hash_map_12.csv.gz
- ClueWeb22-ID_URL-Hash_map_13.csv.gz
- ClueWeb22-ID_URL-Hash_map_14.csv.gz
- ClueWeb22-ID_URL-Hash_map_15.csv.gz
- ClueWeb22-ID_URL-Hash_map_16.csv.gz
- ClueWeb22-ID_URL-Hash_map_17.csv.gz
- ClueWeb22-ID_URL-Hash_map_18.csv.gz
- ClueWeb22-ID_URL-Hash_map_19.csv.gz
- ClueWeb22-ID_URL-Hash_map_20.csv.gz
- ClueWeb22-ID_URL-Hash_map_21.csv.gz
- ClueWeb22-ID_URL-Hash_map_22.csv.gz
- ClueWeb22-ID_URL-Hash_map_23.csv.gz
- ClueWeb22-ID_URL-Hash_map_24.csv.gz
- Files 25-49 are in the correct format