33 items found

Tags: Web data

Filter Results
  • Dataset

    CoPhIR

    The CoPhIR (Content-based Photo Image Retrieval) Test-Collection has been developed to make significant tests on the scalability of the SAPIR project infrastructure (SAPIR:...
    • The resource: 'cophir.isti.cnr.it' is not accessible as guest user. You must login to access it!
  • Dataset

    Official administrative information of Tuscany

    The data contains the spatial partitioning of Tuscany and some statistical information published by the Italian Statistical Bureau.
    • LOD
      The resource: 'Linked Open Data' is not accessible as guest user. You must login to access it!
  • Access required...

    ×

    TrainingMaterial

    Private Jupyter Notebooks

    King’s College London has developed complete stories around Jupyter Notebooks that form easy recipes for reproducible methods in social data science. Jupyter...
  • Access required...

    ×

    TrainingMaterial

    Private Efficiency - Effectiveness Trade-offs in Learning to Rank

    This tutorial provides an 'Introduction to Learning to Rank' and focuses on 'Dealing with the Efficiency/Effectiveness trade-off in Web Search'. Moreover, it provides two...
  • Access required...

    ×

    TrainingMaterial

    Private Interactive Learning Environments

    King’s College London developed a variety of data science materials based on R and Python. R is a de facto standard in statistical computing and visualisation, while our...
  • Dataset

    Wikipedia Word Embeddings

    Embeddings were created through applying word2vec skipgram to a corpus of wikipedia non-stub articles from a December 2015 English dump with the following parameters: -cbow 0...
    • The resource: 'Embeddings' is not accessible as guest user. You must login to access it!
  • TrainingMaterial

    SoBigData Plus Plus e-infrastructure

    In this webinar, we introduce you to the SoBigData R.I. Are shown the main features of the SoBigData R.I. It is explained how to integrate a new method, how to execute an...
    • HTML
      The resource: 'SoBigData++ e-infrastructure' is not accessible as guest user. You must login to access it!
  • TrainingMaterial

    Explaining Explanation Methods

    The most effective Artificial Intelligence (AI) systems exploit complex machine learning models to fulfill their tasks due to their high performance. Unfortunately, the most...
    • HTML
      The resource: 'Explaining Explanation Methods' is not accessible as guest user. You must login to access it!
  • TrainingMaterial

    Introduction to Data Curation

    This course is an introduction to data collection, data preparation & transformation and data analysis. It contains the essential concepts for a researcher in order to...
    • PDF
      The resource: 'Introduction to Data Curation' is not accessible as guest user. You must login to access it!
  • Method

    ArchiveSpark

    ArchiveSpark is an Apache Spark framework for easy data access, processing, extraction as well as derivation for Web archives and archival collections. It has a simple and...
    • The resource: 'ArchiveSpark on GitHub' is not accessible as guest user. You must login to access it!
  • Dataset

    Retail Market Data

    This dataset contains Retail Market Data about food products, from 2007, for about 130 shops of an Italian Distribution chain. Data are of about 1 M of Active Clients, and...
  • Dataset

    MSN Search query log

    The data consists of an MSN Search query log excerpt with 15 million queries, from US users, sampled over one month of activity. Data attributes made available per query: 1)...
  • Dataset

    Russell 3000 stock prices

    This dataset contains the price and volume of the 3000 stocks belonging to the Russell 3000 Index, roughly corresponding to the 3000 more capitalized stocks. Traded volume and...
  • Dataset

    DE webarchive

    The dataset consists of all the content from the .de top level domain as crawled by the Internet Archive.
    • HTML
      The resource: 'Internet Archive Wayback ...' is not accessible as guest user. You must login to access it!
  • Dataset

    GERDAQ Dataset

    This is a benchmark dataset of annotated search-engine queries. Mentions of entities in search-engine queries are tagged with the entity they refer to. Wikipedia is used as...
    • XML
      The resource: 'GERDAQ dataset' is not accessible as guest user. You must login to access it!
  • Dataset

    German Academic Web

    The dataset contains regular crawls of the websites for German academic institutions.
  • Dataset

    .ee Web archive

    .ee Web archive consisting of snapshots from 2015
  • Dataset

    Retail market dataset

    The dataset contains purchases of Unicoop Tirreno customers, description and information of the shops (both small shops and supermarkets) and the customers.
  • Dataset

    The Italian Music Dataset

    The dataset is built by exploiting the Spotify and SoundCloud APIs. It is composed of over 14,500 different songs of both famous and less famous Italian musicians. Each song...
    • JSON
      The resource: 'Dataset' is not accessible as guest user. You must login to access it!
  • Dataset

    UK election abuse data

    The GATE team (gate.ac.uk) at the University of Sheffield have collected 1.4 million tweets sent to and by UK members of parliament in the months leading up to the 2015 and...
    • XLS
      The resource: 'uk-election-abuse.tar.gz' is not accessible as guest user. You must login to access it!
You can also access this registry using the API (see API Docs).