Items - SoBigData.eu Catalogue

Access required...

×

Dataset

Private PoliModal Corpus

The corpus includes the transcripts of 56 TV face-to-face interviews for a total of 14 hours, taken from the Italian political talk show Mezz'ora in più broadcast from 24...

Dataset

BioTAGME: A comprehensive platform for biological knowledge network analysis

This Network was built through BioTAGME, a system that combines TAGME, an entity-annotation framework based on Wikipedia corpus with a network-based inference methodology (i.e.,...

Dataset

Emergency Tweets 2016 Amatrice earthquake

This dataset contais Italian tweets related to the earthquake of 2016 in the Centre of Italy (https://it.wikipedia.org/wiki/Terremoto_del_Centro_Italia_del_2016_e_d...). is...

ZIP
The resource: 'EAQ-AMA.zip' is not accessible as guest user. You must login to access it!

Dataset

Emergency Tweets 2013 Sardinia flood

This dataset is related to the floods that occurred in the Sardinia regional district between 17 and 19 November 2013 (https://en.wikipedia.org/wiki/2013_Sardinia_floods), as...

ZIP
The resource: 'FLO-SAR.zip' is not accessible as guest user. You must login to access it!

Dataset

Emergency Tweets 2009 L'Aquila earthquake

This dataset comprises 1,100 Italian tweets shared in the aftermath of the 2009 L’Aquila earthquake (https://en.wikipedia.org/wiki/2009_L%27Aquila_earthquake). The earthquake...

ZIP
The resource: 'EAQ-LAQ.zip' is not accessible as guest user. You must login to access it!

Dataset

Emergency Tweets 2013 Milan blackout

This dataset is related to a power outage (i.e., a blackout) that occurred in the city of Milan, in northern Italy, in the night between 14 and 15 May 2013. Despite not...

CSV
The resource: 'PWO-MIL_tweets.csv' is not accessible as guest user. You must login to access it!

Dataset

Emergency Tweets 2011 Christchurch earthquake

This dataset contains tweets related to the devastating earthquake occurred on 22 February 2011, at around 12 p.m. local time in Christchurch, New Zealand...

CSV
The resource: 'EAQ-CHR_tweets.csv' is not accessible as guest user. You must login to access it!

Dataset

Geo-annotated tweets ENG-ITA

ZIP
The resource: 'geo-annotated tweets.zip' is not accessible as guest user. You must login to access it!

Dataset

Emergency Tweets 2014 Genoa flood

This dataset contains Italian tweets collected during and in the aftermath of the floods that occurred near the city of Genoa between 9 and 11 October 2014...

ZIP
The resource: 'FLO-GEN.zip' is not accessible as guest user. You must login to access it!

Dataset

Emergency Tweets 2012 Emilia earthquake

This dataset contains 3,170 Italian tweets about the earthquakes that stroke the Emilia Romagna regional district in Italy on 20 May 2012 starting from 4 a.m. local time...

ZIP
The resource: 'EAQ-EML.zip' is not accessible as guest user. You must login to access it!

Dataset

Twitter dataset about two premier UK music festivals

The dataset contains twitter posts about two premier UK music festivals: Creamfields 2016 (on August 25th-28th) and VFestival 2016 (on August 20th-21st).

Github
The resource: 'Twitter dataset about two ...' is not accessible as guest user. You must login to access it!

Dataset

Broad Twitter Corpus

The Broad Twitter Corpus is a named entity-annotated dataset of tweets, collected in order to capture temporal, spatial and social diversity. The goal of the corpus is to...

JSON
The resource: 'Broad Twitter Corpus' is not accessible as guest user. You must login to access it!

Dataset

Sheffield NERD Tweet Corpus

The dataset contais 794 tweets annotated with named entities disambiguated against DBpedia, and split into equally sized training and test portions. 400 tweets from 2013 comes...

FINF
The resource: 'Sheffield NERD Tweet Corpus' is not accessible as guest user. You must login to access it!

Dataset

Wikinews dataset

This dataset consists of a sample of 365 news published by Wikinews from November 2004 to June 2014 and annotated with about 5000 entities, each associated with a saliency...

JSON
The resource: 'entity-saliency' is not accessible as guest user. You must login to access it!

Dataset

The Italian Music Dataset

The dataset is built by exploiting the Spotify and SoundCloud APIs. It is composed of over 14,500 different songs of both famous and less famous Italian musicians. Each song...

JSON
The resource: 'Dataset' is not accessible as guest user. You must login to access it!

Dataset

WIRE dataset

This dataset consists of 503 pairs of Wikipedia entities drawn from the New York Times dataset with a human assigned relatedness score. The domain experts based their...

HTML
The resource: 'WikipediaRelatedness' is not accessible as guest user. You must login to access it!
CSV
The resource: 'WIRE dataset' is not accessible as guest user. You must login to access it!

Dataset

Wikipedia Word Embeddings

Embeddings were created through applying word2vec skipgram to a corpus of wikipedia non-stub articles from a December 2015 English dump with the following parameters: -cbow 0...

The resource: 'Embeddings' is not accessible as guest user. You must login to access it!

Dataset

Amazon reviews

This (link to the) dataset contains product reviews and metadata from Amazon, including 142.8 million reviews spanning May 1996 - July 2014. This dataset includes reviews...

HTML
The resource: 'Julian McAuley's repository.' is not accessible as guest user. You must login to access it!

Dataset

Conversational search dataset with labels

CAsT 2019 data is split into two files one for training and the other one for testing. - Training set: CAsT 2019 conversations from training set and from test set without...

The resource: 'Conversational dataset ...' is not accessible as guest user. You must login to access it!

Dataset

Learning to quantify: LeQua 2022 datasets

The aim of LeQua 2022 (the 1st edition of the CLEF “Learning to Quantify” lab) is to allow the comparative evaluation of methods for “learning to quantify” in textual...

The resource: 'Zenodo link' is not accessible as guest user. You must login to access it!

44 items found

Access required...

Private PoliModal Corpus