-
MSN Search query log
The data consists of an MSN Search query log excerpt with 15 million queries, from US users, sampled over one month of activity. Data attributes made available per query: 1)... -
WIRE dataset
This dataset consists of 503 pairs of Wikipedia entities drawn from the New York Times dataset with a human assigned relatedness score. The domain experts based their... -
Wikipedia Word Embeddings
Embeddings were created through applying word2vec skipgram to a corpus of wikipedia non-stub articles from a December 2015 English dump with the following parameters: -cbow 0... -
Amazon reviews
This (link to the) dataset contains product reviews and metadata from Amazon, including 142.8 million reviews spanning May 1996 - July 2014. This dataset includes reviews...-
HTML
The resource: 'Julian McAuley's repository.' is not accessible as guest user. You must login to access it!
-
HTML
-
Facebook EuroSys 2009
This dataset contains Social and interaction graphs representing two large-scale Facebook regional networks. Social graphs describe Facebook friendships between users... -
Facebook - New Orleans regional network
This dataset contains information about 90,269 users and 3,646,662 friendship links between those users. These users belong to the New Orleans Facebook regional network. The...-
HTML
The resource: 'New Orleans Facebook dataset' is not accessible as guest user. You must login to access it!
-
HTML
-
CoPhIR
The CoPhIR (Content-based Photo Image Retrieval) Test-Collection has been developed to make significant tests on the scalability of the SAPIR project infrastructure (SAPIR:... -
Conversational search dataset with labels
CAsT 2019 data is split into two files one for training and the other one for testing. - Training set: CAsT 2019 conversations from training set and from test set without... -
MAMe dataset
The MAMe dataset ia an image classification dataset with remarkable high resolution and variable shape properties. The goal of MAMe is to provide a tool for studying the... -
A dataset of gamers on Twitter
This gaming-related dataset consists of 8932 users (labeled as gamers) engaging in game-related conversations. We have collected (June 2018) their timeline (the most recent 3200... -
Learning to quantify: LeQua 2022 datasets
The aim of LeQua 2022 (the 1st edition of the CLEF “Learning to Quantify” lab) is to allow the comparative evaluation of methods for “learning to quantify” in textual... -
Product Reviews for Ordinal Quantification
This data set comprises a labeled training set, validation samples, and testing samples for ordinal quantification. It appears in our research paper "Ordinal Quantification... -
Cherenkov Telescope Data for Ordinal Quantification
This labeled data set is targeted at ordinal quantification. It appears in our research paper "Ordinal Quantification Through Regularization", which we have published at... -
VaxxHesitancy: A Dataset for Studying Hesitancy Towards COVID-19 Vaccination ...
We create a publicly available dataset of over 3,100 COVID-19 vaccine-related tweets labeled as one of four stance categories: pro-vaxx, anti-vaxx, vaxx-hesitant, or... -
Ukraine-related Disinformation Dataset
Ukraine-related disinformation dataset from "Comparative Analysis of Engagement, Themes, and Causality of Ukraine-Related Debunks and Disinformation" (accepted at SocInfo... -
Ego Networks of Words in Twitter
This set of dataframes were used in our last paper : Ollivier K, Boldrini C, Passarella A, Conti M (2022) Structural invariants and semantic fingerprints in the “ego network”... -
Cross-Lingual Dataset of Crisis-Related Social Media
If you use this dataset, please cite the following paper: Fedor Vitiugin, Carlos Castillo: Cross-Lingual Query-Based Summarization of Crisis-Related Social Media: An Abstractive... -
Dataset for Evaluating Abstractive Summaries of Crisis-Related Social Media
The dataset created for evaluation of summaries generated from social media posted during five natural disasters. The dataset contains: ground truth reports created by human...