-
Private PoliModal Corpus
The corpus includes the transcripts of 56 TV face-to-face interviews for a total of 14 hours, taken from the Italian political talk show Mezz'ora in più broadcast from 24... -
BioTAGME: A comprehensive platform for biological knowledge network analysis
This Network was built through BioTAGME, a system that combines TAGME, an entity-annotation framework based on Wikipedia corpus with a network-based inference methodology (i.e.,... -
Emergency Tweets 2016 Amatrice earthquake
This dataset contais Italian tweets related to the earthquake of 2016 in the Centre of Italy (https://it.wikipedia.org/wiki/Terremoto_del_Centro_Italia_del_2016_e_d...). is...-
ZIP
The resource: 'EAQ-AMA.zip' is not accessible as guest user. You must login to access it!
-
ZIP
-
Emergency Tweets 2013 Sardinia flood
This dataset is related to the floods that occurred in the Sardinia regional district between 17 and 19 November 2013 (https://en.wikipedia.org/wiki/2013_Sardinia_floods), as...-
ZIP
The resource: 'FLO-SAR.zip' is not accessible as guest user. You must login to access it!
-
ZIP
-
Emergency Tweets 2009 L'Aquila earthquake
This dataset comprises 1,100 Italian tweets shared in the aftermath of the 2009 L’Aquila earthquake (https://en.wikipedia.org/wiki/2009_L%27Aquila_earthquake). The earthquake...-
ZIP
The resource: 'EAQ-LAQ.zip' is not accessible as guest user. You must login to access it!
-
ZIP
-
Emergency Tweets 2013 Milan blackout
This dataset is related to a power outage (i.e., a blackout) that occurred in the city of Milan, in northern Italy, in the night between 14 and 15 May 2013. Despite not...-
CSV
The resource: 'PWO-MIL_tweets.csv' is not accessible as guest user. You must login to access it!
-
CSV
-
Emergency Tweets 2011 Christchurch earthquake
This dataset contains tweets related to the devastating earthquake occurred on 22 February 2011, at around 12 p.m. local time in Christchurch, New Zealand...-
CSV
The resource: 'EAQ-CHR_tweets.csv' is not accessible as guest user. You must login to access it!
-
CSV
-
-
ZIP
The resource: 'geo-annotated tweets.zip' is not accessible as guest user. You must login to access it!
-
ZIP
-
Emergency Tweets 2014 Genoa flood
This dataset contains Italian tweets collected during and in the aftermath of the floods that occurred near the city of Genoa between 9 and 11 October 2014...-
ZIP
The resource: 'FLO-GEN.zip' is not accessible as guest user. You must login to access it!
-
ZIP
-
Emergency Tweets 2012 Emilia earthquake
This dataset contains 3,170 Italian tweets about the earthquakes that stroke the Emilia Romagna regional district in Italy on 20 May 2012 starting from 4 a.m. local time...-
ZIP
The resource: 'EAQ-EML.zip' is not accessible as guest user. You must login to access it!
-
ZIP
-
Twitter dataset about two premier UK music festivals
The dataset contains twitter posts about two premier UK music festivals: Creamfields 2016 (on August 25th-28th) and VFestival 2016 (on August 20th-21st).-
Github
The resource: 'Twitter dataset about two ...' is not accessible as guest user. You must login to access it!
-
Github
-
Broad Twitter Corpus
The Broad Twitter Corpus is a named entity-annotated dataset of tweets, collected in order to capture temporal, spatial and social diversity. The goal of the corpus is to...-
JSON
The resource: 'Broad Twitter Corpus' is not accessible as guest user. You must login to access it!
-
JSON
-
Sheffield NERD Tweet Corpus
The dataset contais 794 tweets annotated with named entities disambiguated against DBpedia, and split into equally sized training and test portions. 400 tweets from 2013 comes...-
FINF
The resource: 'Sheffield NERD Tweet Corpus' is not accessible as guest user. You must login to access it!
-
FINF
-
Wikinews dataset
This dataset consists of a sample of 365 news published by Wikinews from November 2004 to June 2014 and annotated with about 5000 entities, each associated with a saliency...-
JSON
The resource: 'entity-saliency' is not accessible as guest user. You must login to access it!
-
JSON
-
The Italian Music Dataset
The dataset is built by exploiting the Spotify and SoundCloud APIs. It is composed of over 14,500 different songs of both famous and less famous Italian musicians. Each song...-
JSON
The resource: 'Dataset' is not accessible as guest user. You must login to access it!
-
JSON
-
WIRE dataset
This dataset consists of 503 pairs of Wikipedia entities drawn from the New York Times dataset with a human assigned relatedness score. The domain experts based their... -
Wikipedia Word Embeddings
Embeddings were created through applying word2vec skipgram to a corpus of wikipedia non-stub articles from a December 2015 English dump with the following parameters: -cbow 0... -
Amazon reviews
This (link to the) dataset contains product reviews and metadata from Amazon, including 142.8 million reviews spanning May 1996 - July 2014. This dataset includes reviews...-
HTML
The resource: 'Julian McAuley's repository.' is not accessible as guest user. You must login to access it!
-
HTML
-
Conversational search dataset with labels
CAsT 2019 data is split into two files one for training and the other one for testing. - Training set: CAsT 2019 conversations from training set and from test set without... -
Learning to quantify: LeQua 2022 datasets
The aim of LeQua 2022 (the 1st edition of the CLEF “Learning to Quantify” lab) is to allow the comparative evaluation of methods for “learning to quantify” in textual...