approved
ClueWeb12

The ClueWeb12 dataset consists of 733,019,372 English web pages, collected between February 10, 2012 and May 10, 2012. It was created to support research on information retrieval and related human language technologies. ClueWeb12 is a companion or successor to the ClueWeb09 web dataset. Distribution of ClueWeb12 began in January 2013.

Tags
Data and Resources
To access the resources you must log in

This item has no data

Personal Data Attributes

Description: Personal Data related Information

Field Value
ChildrenData No
Personal Data No
Personal data was manifestly made public by the data subject No
Additional Info
Field Value
Accessibility Trans National Access
Accessibility Mode OnLine Access
Availability On-Site
Basic rights Temporary download of a single copy only
Consent obtained also covers the envisaged transfer of the personal data outside the EU No
Consent of the data subject No
Creation Date 2013-01-17
Creator AA.VV., Carnegie Mellon University
DataProtectionDirective Not applicable
Field/Scope of use Research only
Format ascii
Group Societal Debates and Misinformation
Language eng, English
Manifestation Type Replica
Processing Degree Primary
Size 733,019,372 English web pages
SoBigData Node SoBigData EU
Sublicense rights No
Territory of use World Wide
Thematic Cluster Web Analytics [WA]
TimeCoverage 2012-02-10 - 2012-05-10
system:type Dataset
Management Info
Field Value
Author Muntean Cristina
Maintainer Muntean Cristina
Version 1
Last Updated 18 October 2023, 23:47 (CEST)
Created 29 June 2018, 11:34 (CEST)