16 items found

Tags: Web data Text mining

Filter Results
  • Dataset

    SWH Filenames

    A 69 GB dataset with ~2.3 billion strings representing deduplicated names of source code files collected by Software Heritage, the great library of source code...
    • ZIP
      The resource: 'SWH Filenames' is not accessible as guest user. You must login to access it!
  • Dataset

    Santorini Tweets July-August 2021

    This dataset contains 225.501 tweets written by 141.277 users. These tweets are geolocated in Santorini, or they contain the word or the hashtag "santorini" in the text. They...
    • ZIP
      The resource: 'tweet_santorini.csv' is not accessible as guest user. You must login to access it!
  • TrainingMaterial

    Introduction to Data Curation

    This course is an introduction to data collection, data preparation & transformation and data analysis. It contains the essential concepts for a researcher in order to...
    • PDF
      The resource: 'Introduction to Data Curation' is not accessible as guest user. You must login to access it!
  • Dataset

    Articles and comments of major Estonian newspapers

    The dataset contains articles and comments of four major Estonian news portals since early 2000s to 2016.
    • PDF
      The resource: 'Misinformation Detection ...' is not accessible as guest user. You must login to access it!
    • PDF
      The resource: 'DEAP-FAKED: Knowledge ...' is not accessible as guest user. You must login to access it!
    • PDF
      The resource: 'Research Article' is not accessible as guest user. You must login to access it!
  • Application

    SMAPH Query Entity Linker

    The SMAPH system links queries to the entities it mentions, disambiguating mentions if needed. Entities are Wikipedia pages. This problem is known as "entity recognition and...
    • HTML
      The resource: 'SMAPH documentation' is not accessible as guest user. You must login to access it!
  • Method

    Quantum Distance-Based Classifier

    The Quantum Distance-Based Classifier is a technique inspired by the classical k-Nearest Neighbors that leverages quantum properties to perform prediction.
  • Dataset

    The Italian Music Dataset

    The dataset is built by exploiting the Spotify and SoundCloud APIs. It is composed of over 14,500 different songs of both famous and less famous Italian musicians. Each song...
    • JSON
      The resource: 'Dataset' is not accessible as guest user. You must login to access it!
  • Method

    ArchiveSpark

    ArchiveSpark is an Apache Spark framework for easy data access, processing, extraction as well as derivation for Web archives and archival collections. It has a simple and...
    • The resource: 'ArchiveSpark on GitHub' is not accessible as guest user. You must login to access it!
  • Dataset

    Wikipedia Word Embeddings

    Embeddings were created through applying word2vec skipgram to a corpus of wikipedia non-stub articles from a December 2015 English dump with the following parameters: -cbow 0...
    • The resource: 'Embeddings' is not accessible as guest user. You must login to access it!
  • Dataset

    Product Reviews for Ordinal Quantification

    This data set comprises a labeled training set, validation samples, and testing samples for ordinal quantification. It appears in our research paper "Ordinal Quantification...
    • The resource: 'Zenodo link' is not accessible as guest user. You must login to access it!
  • TrainingMaterial

    Interactive Learning Environments

    King’s College London developed a variety of data science materials based on R and Python. R is a de facto standard in statistical computing and visualisation, while our...
    • ZIP
      The resource: 'Rstudio docker image' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'VirtualBox' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'Swirl courses' is not accessible as guest user. You must login to access it!
  • TrainingMaterial

    Efficiency - Effectiveness Trade-offs in Learning to Rank

    This tutorial provides an 'Introduction to Learning to Rank' and focuses on 'Dealing with the Efficiency/Effectiveness trade-off in Web Search'. Moreover, it provides two...
    • PDF
      The resource: 'Introduction to Learning ...' is not accessible as guest user. You must login to access it!
    • PDF
      The resource: 'Dealing with the ...' is not accessible as guest user. You must login to access it!
    • python
      The resource: 'Hands-on Session 1 ' is not accessible as guest user. You must login to access it!
    • python
      The resource: 'Hands-on Session 2 ' is not accessible as guest user. You must login to access it!
    • PDF
      The resource: 'Publicly available ...' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'Istella Learning to Rank ...' is not accessible as guest user. You must login to access it!
  • TrainingMaterial

    Jupyter Notebooks

    King’s College London has developed complete stories around Jupyter Notebooks that form easy recipes for reproducible methods in social data science. Jupyter...
    • ZIP
      The resource: 'Historical Cultures Repository' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'Prediction Modelling ...' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'Social and Cultural ...' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'Social Sensing Repository' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'Visual Arts Repository' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'Ananke Guide' is not accessible as guest user. You must login to access it!
    • mp4
      The resource: 'Ananke Guide Video' is not accessible as guest user. You must login to access it!
You can also access this registry using the API (see API Docs).