31 items found

Tags: Web data

Filter Results
  • SoBigData.eu: Dataset

    Global Peace Index data

    A dataset of the Global Peace Index (GPI), which ranks 163 independent states and territories according to their level of peacefulness. The GPI covers 99.7 per cent of the...
  • SoBigData.eu: Dataset

    Yeast

    The yeast dataset is a collection of yeast microarray expressions and phylogenetic profiles which can be used to learn the yeast gene functional categories. One row of this...
    • arff
      The resource: 'Yeast Dataset' is not accessible as guest user. You must login to access it!
  • SoBigData.eu: Dataset

    Medical Dataset

    The medical dataset contains a corpus of fully anonymized clinical text. Each document in the corpus is associated with a set of ICD-9 codes which represents the diagnosis...
    • ZIP
      The resource: 'Medical Dataset' is not accessible as guest user. You must login to access it!
    • CSV
      The resource: 'Churn Dataset' is not accessible as guest user. You must login to access it!
  • SoBigData.eu: Dataset

    German Credit

    In the german credit dataset each one of the 1,000 persons is classified as a good or bad creditor according to attributes like age, sex, checking_account, credit_amount,...
    • CSV
      The resource: 'German Credit' is not accessible as guest user. You must login to access it!
  • SoBigData.eu: Dataset

    Compas

    The compas dataset contains the features used by the COMPAS algorithm for scoring defendants and their risk (Low, Medium and High), for over $4,000$ individuals. We considered...
    • CSV
      The resource: 'https://www' is not accessible as guest user. You must login to access it!
  • SoBigData.eu: Dataset

    Dataset Adult

    The adult dataset includes $48,842$ instances with demographic information like age, workclass, marital-status, race, capital-loss, capital-gain etc. The income attribute...
    • CSV
      The resource: 'Adult' is not accessible as guest user. You must login to access it!
  • SoBigData.eu: TrainingMaterial

    Introduction to Data Curation

    This course is an introduction to data collection, data preparation & transformation and data analysis. It contains the essential concepts for a researcher in order to...
    • PDF
      The resource: 'Introduction to Data Curation' is not accessible as guest user. You must login to access it!
  • SoBigData.eu: Application

    SMAPH Query Entity Linker

    The SMAPH system links queries to the entities it mentions, disambiguating mentions if needed. Entities are Wikipedia pages. This problem is known as "entity recognition and...
    • HTML
      The resource: 'SMAPH documentation' is not accessible as guest user. You must login to access it!
  • Access required...

    ×

    SoBigData.eu: TrainingMaterial

    Private Jupyter Notebooks

    King’s College London has developed complete stories around Jupyter Notebooks that form easy recipes for reproducible methods in social data science. Jupyter...
  • Access required...

    ×

    SoBigData.eu: TrainingMaterial

    Private Interactive Learning Environments

    King’s College London developed a variety of data science materials based on R and Python. R is a de facto standard in statistical computing and visualisation, while our...
  • SoBigData.eu: Dataset

    Wikipedia Word Embeddings

    Embeddings were created through applying word2vec skipgram to a corpus of wikipedia non-stub articles from a December 2015 English dump with the following parameters: -cbow 0...
    • The resource: 'Embeddings' is not accessible as guest user. You must login to access it!
  • SoBigData.eu: Dataset

    Word Sense Evolution Testset

    This testset consists of 23 terms which have experienced word sense change during the past centuries. The main changes for each term were found using Wikipedia, dictionary.com...
    • ZIP
      The resource: 'WSE-testset.zip' is not accessible as guest user. You must login to access it!
  • SoBigData.eu: Dataset

    Retail Market Data

    This dataset contains Retail Market Data about food products, from 2007, for about 130 shops of an Italian Distribution chain. Data are of about 1 M of Active Clients, and...
  • SoBigData.eu: Dataset

    NYSE transactions

    This dataset contains financial data on the price of the top 250 most liquid assets of New York Stock Exchange (NYSE) from 2006 to 2014. The dataset contains transactions,...
  • SoBigData.eu: Dataset

    FED data

    March 2001- September 2013 quarterly data of US banks' holdings. The number of financial institutions present in the data is pretty stable during quarters, starting from...
  • SoBigData.eu: Dataset

    ClueWeb12

    The ClueWeb12 dataset consists of 733,019,372 English web pages, collected between February 10, 2012 and May 10, 2012. It was created to support research on information...
  • SoBigData.eu: Dataset

    ClueWeb09

    The ClueWeb09 dataset consists of about 1 billion web pages in ten languages that were collected in January and February 2009. It was created to support research on...
  • SoBigData.eu: Dataset

    Articles and comments of major Estonian newspapers

    The dataset contains articles and comments of four major Estonian news portals since early 2000s to 2016.
  • SoBigData.eu: Dataset

    UK election abuse data

    The GATE team (gate.ac.uk) at the University of Sheffield have collected 1.4 million tweets sent to and by UK members of parliament in the months leading up to the 2015 and...
    • XLS
      The resource: 'uk-election-abuse.tar.gz' is not accessible as guest user. You must login to access it!
You can also access this registry using the API (see API Docs).