46 items found

Organisations: SoBigData Services and Products Licenses: Creative Commons Attribution 4.0

Filter Results
  • Dataset

    Gene Disease Association Data and Features

    This dataset contains data that can be used for disease gene discovery purposes. The data cover ten different diseases with associated seed genes (derived from DisGeNET) and...
    • RAR
      The resource: 'Gene_Disease_Association_Da ...' is not accessible as guest user. You must login to access it!
  • Dataset

    Twitter EURO2020: BLM debate in Italy

    Twitter Dataset for "Will You Take the Knee? Italian Twitter Echo Chambers' Genesis During EURO 2020" The dataset is comprised of the following files:...
    • JSON
      The resource: 'Twitter EURO2020' is not accessible as guest user. You must login to access it!
  • Dataset

    Reddit Echo Chamber dataset

    In a digital environment, the term echo chamber refers to an alarming phenomenon in which beliefs are amplified or reinforced by communication repetition inside a closed...
    • ZIP
      The resource: 'Reddit Echochamber' is not accessible as guest user. You must login to access it!
  • Dataset

    SWH Filenames

    A 69 GB dataset with ~2.3 billion strings representing deduplicated names of source code files collected by Software Heritage, the great library of source code...
    • ZIP
      The resource: 'SWH Filenames' is not accessible as guest user. You must login to access it!
  • Dataset

    DNA 31-mers

    A 12 GB dataset containing all the ~367M unique 31-mers in the DNA sequences available in the Pizza&Chili Corpus (https://pizzachili.dcc.uchile.cl/texts.html). This dataset...
    • ZIP
      The resource: 'DNA 31-mers' is not accessible as guest user. You must login to access it!
  • Dataset

    Compounds with Activity against the Dopamine D2 Receptor

    Database containing compounds active against the dopamine D2 receptor together with random inactive compounds as negative samples for learning purposes. Train, validation, and...
    • ZIP
      The resource: 'compound_activity_dopamine_d2' is not accessible as guest user. You must login to access it!
  • Dataset

    DNA 12-mers

    A 179 MB dataset containing all the ~14M unique 12-mers in the DNA sequences available in the Pizza&Chili Corpus (https://pizzachili.dcc.uchile.cl/texts.html). This dataset...
    • ZIP
      The resource: 'DNA 12-mers' is not accessible as guest user. You must login to access it!
  • Dataset

    Papers on Gender Bias in Academic Promotions

    This dataset contains the result of a systematic mapping study conducted to analyse how the issue of gender bias in academic promotions has been addressed by the literature....
    • CSV
      The resource: 'Dataset' is not accessible as guest user. You must login to access it!
  • Dataset

    BioTAGME: A comprehensive platform for biological knowledge network analysis

    This Network was built through BioTAGME, a system that combines TAGME, an entity-annotation framework based on Wikipedia corpus with a network-based inference methodology (i.e.,...
  • Dataset

    GPS Tracks - Milan, Italy - Simulated

    This datataset contains simulated tracks of private cars in Milan. The dataset is generated from a real dataset of people in order to respect some general statistics and...
    • ZIP
      The resource: 'Milano Simulated Data' is not accessible as guest user. You must login to access it!
  • Dataset

    Official administrative information of Tuscany

    The data contains the spatial partitioning of Tuscany and some statistical information published by the Italian Statistical Bureau.
    • LOD
      The resource: 'Linked Open Data' is not accessible as guest user. You must login to access it!
  • Dataset

    Broad Twitter Corpus

    The Broad Twitter Corpus is a named entity-annotated dataset of tweets, collected in order to capture temporal, spatial and social diversity. The goal of the corpus is to...
    • JSON
      The resource: 'Broad Twitter Corpus' is not accessible as guest user. You must login to access it!
  • Dataset

    UK election abuse data

    The GATE team (gate.ac.uk) at the University of Sheffield have collected 1.4 million tweets sent to and by UK members of parliament in the months leading up to the 2015 and...
    • XLS
      The resource: 'uk-election-abuse.tar.gz' is not accessible as guest user. You must login to access it!
    • PDF
      The resource: 'Machine Learning ...' is not accessible as guest user. You must login to access it!
  • Method

    DebtRank Systemic Risk Estimation Method

    The DebtRank algorithm is used to estimate the impact of shocks in financial networks, as it overcomes the limitations of the traditional default-cascade approaches.The method...
    • RAR
      The resource: 'Systemic risk estimation.rar' is not accessible as guest user. You must login to access it!
    • HTML
      The resource: 'Related paper' is not accessible as guest user. You must login to access it!
  • Method

    Estimating Collective Wellbeing

    This method estimates the wellbeing of a country by using an alternative measure to GDP. The proposed measure is the average sophistication of the satisfiable needs of a...
    • ZIP
      The resource: 'EstimatingCollectiveWellbei ...' is not accessible as guest user. You must login to access it!
  • Method

    Cymrie Welsh Named Entity Recognizer

    The CYMRIE named entity recognition is a service for the analysis of Welsh text. It identifies name of persons, locations, organizations, as well as money amounts, time and...
    • method-engine
      The resource: 'Run method' is not accessible as guest user. You must login to access it!
  • Method

    English Part Of Speech And Morphology Anaylizer

    This method annotates tokens and sentences in English texts, adding part-of-speech, morphological root and affix to each token.
    • method-engine
      The resource: 'Run method' is not accessible as guest user. You must login to access it!
  • Method

    English Tweet Tokenizer

    This tool identifies words and punctuations in tweets.
    • method-engine
      The resource: 'Run method' is not accessible as guest user. You must login to access it!
  • Method

    French Named Entity Recognizer For Tweets

    This method analyses French tweets for names of persons, locations and organizations. It also performs normalization of abbreviations and common Twitter slang.
    • method-engine
      The resource: 'Run method' is not accessible as guest user. You must login to access it!
You can also access this registry using the API (see API Docs).