Items - SoBigData.eu Catalogue

Dataset

SWH Filenames

A 69 GB dataset with ~2.3 billion strings representing deduplicated names of source code files collected by Software Heritage, the great library of source code...

ZIP
The resource: 'SWH Filenames' is not accessible as guest user. You must login to access it!

Dataset

Santorini Tweets July-August 2021

This dataset contains 225.501 tweets written by 141.277 users. These tweets are geolocated in Santorini, or they contain the word or the hashtag "santorini" in the text. They...

ZIP
The resource: 'tweet_santorini.csv' is not accessible as guest user. You must login to access it!

Dataset

Articles and comments of major Estonian newspapers

The dataset contains articles and comments of four major Estonian news portals since early 2000s to 2016.

Experiment

Misinformation Detection on YouTube Using Video Captions

PDF
The resource: 'Misinformation Detection ...' is not accessible as guest user. You must login to access it!

Experiment

DEAP-FAKED: Knowledge Graph based Approach for Fake News Detection

PDF
The resource: 'DEAP-FAKED: Knowledge ...' is not accessible as guest user. You must login to access it!

Experiment

(Mis-)leading the Covid-19 vaccination discourse on Twitter— A study of infod...

PDF
The resource: 'Research Article' is not accessible as guest user. You must login to access it!

Application

SMAPH Query Entity Linker

The SMAPH system links queries to the entities it mentions, disambiguating mentions if needed. Entities are Wikipedia pages. This problem is known as "entity recognition and...

HTML
The resource: 'SMAPH documentation' is not accessible as guest user. You must login to access it!

Method

Quantum Distance-Based Classifier

The Quantum Distance-Based Classifier is a technique inspired by the classical k-Nearest Neighbors that leverages quantum properties to perform prediction.

Dataset

The Italian Music Dataset

The dataset is built by exploiting the Spotify and SoundCloud APIs. It is composed of over 14,500 different songs of both famous and less famous Italian musicians. Each song...

JSON
The resource: 'Dataset' is not accessible as guest user. You must login to access it!

Method

ArchiveSpark

ArchiveSpark is an Apache Spark framework for easy data access, processing, extraction as well as derivation for Web archives and archival collections. It has a simple and...

The resource: 'ArchiveSpark on GitHub' is not accessible as guest user. You must login to access it!

Dataset

Wikipedia Word Embeddings

Embeddings were created through applying word2vec skipgram to a corpus of wikipedia non-stub articles from a December 2015 English dump with the following parameters: -cbow 0...

The resource: 'Embeddings' is not accessible as guest user. You must login to access it!

Dataset

Product Reviews for Ordinal Quantification

This data set comprises a labeled training set, validation samples, and testing samples for ordinal quantification. It appears in our research paper "Ordinal Quantification...

The resource: 'Zenodo link' is not accessible as guest user. You must login to access it!

12 items found