70 items found

Tags: Web data

Filter Results
  • TrainingMaterial

    Jupyter Notebooks

    King’s College London has developed complete stories around Jupyter Notebooks that form easy recipes for reproducible methods in social data science. Jupyter...
    • ZIP
      The resource: 'Historical Cultures Repository' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'Prediction Modelling ...' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'Social and Cultural ...' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'Social Sensing Repository' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'Visual Arts Repository' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'Ananke Guide' is not accessible as guest user. You must login to access it!
    • mp4
      The resource: 'Ananke Guide Video' is not accessible as guest user. You must login to access it!
  • TrainingMaterial

    Efficiency - Effectiveness Trade-offs in Learning to Rank

    This tutorial provides an 'Introduction to Learning to Rank' and focuses on 'Dealing with the Efficiency/Effectiveness trade-off in Web Search'. Moreover, it provides two...
    • PDF
      The resource: 'Introduction to Learning ...' is not accessible as guest user. You must login to access it!
    • PDF
      The resource: 'Dealing with the ...' is not accessible as guest user. You must login to access it!
    • python
      The resource: 'Hands-on Session 1 ' is not accessible as guest user. You must login to access it!
    • python
      The resource: 'Hands-on Session 2 ' is not accessible as guest user. You must login to access it!
    • PDF
      The resource: 'Publicly available ...' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'Istella Learning to Rank ...' is not accessible as guest user. You must login to access it!
  • TrainingMaterial

    Interactive Learning Environments

    King’s College London developed a variety of data science materials based on R and Python. R is a de facto standard in statistical computing and visualisation, while our...
    • ZIP
      The resource: 'Rstudio docker image' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'VirtualBox' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'Swirl courses' is not accessible as guest user. You must login to access it!
  • TrainingMaterial

    SoBigData Plus Plus e-infrastructure

    In this webinar, we introduce you to the SoBigData R.I. Are shown the main features of the SoBigData R.I. It is explained how to integrate a new method, how to execute an...
    • HTML
      The resource: 'SoBigData++ e-infrastructure' is not accessible as guest user. You must login to access it!
    • .pdf
      The resource: 'Webinar Introduction' is not accessible as guest user. You must login to access it!
    • .pdf
      The resource: 'SoBigData e-Infrastructure' is not accessible as guest user. You must login to access it!
    • .pdf
      The resource: 'Execute an experiment' is not accessible as guest user. You must login to access it!
    • .pdf
      The resource: 'Integrate a new Method' is not accessible as guest user. You must login to access it!
    • .pdf
      The resource: 'Integrate a new Dataset' is not accessible as guest user. You must login to access it!
  • Dataset

    Air Quality Datasets over L'Aquila Region

    These datasets have been collected through ESA, CeTEMPS and ARTA. They are a work-in-progress deliverable of a virtual laboratory (VL-Disaster) in the context of the SoBigData.
    • CSV
      The resource: 'CeTEMPS Dataset up to 2023' is not accessible as guest user. You must login to access it!
    • CSV
      The resource: 'ARTA AirQuality up to 2023' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'ESA Sentinel 5P NO2 daily ...' is not accessible as guest user. You must login to access it!
    • HTML
      The resource: 'Map of the area pollutants ...' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'Dataset' is not accessible as guest user. You must login to access it!
  • Access required...

    ×

    Method

    Private Cybersecurity NER BERT-base-cased model

    This method includes a Python script and files of a BERT-base-cased model fine-tuned on our Cybersecurity NER dataset. The method requires as input a list of sentences that...
  • Method

    Cybersecurity NER RoBERTa-base model

    This method includes a Python script and files of a RoBERTa-base model fine-tuned on our Cybersecurity NER dataset. The method requires as input a list of sentences that will...
    • JSON
      The resource: 'config' is not accessible as guest user. You must login to access it!
    • TXT
      The resource: 'merges' is not accessible as guest user. You must login to access it!
    • BIN
      The resource: 'model' is not accessible as guest user. You must login to access it!
    • JSON
      The resource: 'model_args' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'scheduler' is not accessible as guest user. You must login to access it!
    • JSON
      The resource: 'special_tokens_map' is not accessible as guest user. You must login to access it!
    • JSON
      The resource: 'tokenizer_config' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'training_args' is not accessible as guest user. You must login to access it!
    • JSON
      The resource: 'tokenizer' is not accessible as guest user. You must login to access it!
    • JSON
      The resource: 'vocab' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'optimizer' is not accessible as guest user. You must login to access it!
    • py
      The resource: 'inference' is not accessible as guest user. You must login to access it!
  • Method

    Cybersecurity NER SecureBERT model

    This method includes a Python script and files of a SecureBERT model fine-tuned on our Cybersecurity NER dataset. The method requires as input a list of sentences that will be...
    • JSON
      The resource: 'config' is not accessible as guest user. You must login to access it!
    • TXT
      The resource: 'merges' is not accessible as guest user. You must login to access it!
    • BIN
      The resource: 'model' is not accessible as guest user. You must login to access it!
    • JSON
      The resource: 'model_args' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'optimizer' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'scheduler' is not accessible as guest user. You must login to access it!
    • JSON
      The resource: 'special_tokens_map' is not accessible as guest user. You must login to access it!
    • JSON
      The resource: 'tokenizer' is not accessible as guest user. You must login to access it!
    • JSON
      The resource: 'tokenizer_config' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'training_args' is not accessible as guest user. You must login to access it!
    • TXT
      The resource: 'vocab' is not accessible as guest user. You must login to access it!
    • text/x-python
      The resource: 'inference' is not accessible as guest user. You must login to access it!
  • Access required...

    ×

    Method

    Private Dynamical Linear Upper Confidence Bound (DynLin-UCB)

    The repository contains the code to run DynLin-UCB (Dynamical Linear Upper Confidence Bound). DynLin-UCB is an optimistic regret-minimization algorithm that can be used to...
  • Dataset

    Multi-Task Faces (MTF) dataset

    The Multi-Task Faces (MTF) dataset consists of cropped human faces for classification tasks or other research purposes. Each image in the dataset is labelled according to four...
    • ZIP
      The resource: 'MTF_dataset_20230701' is not accessible as guest user. You must login to access it!
  • Access required...

    ×

    Dataset

    Private Cybersecurity NER dataset

    Our dataset is created by merging APTNER and CyNER datasets, containing 13601 sentences, 347779 tokens, and 37684 entities. The split ratio was roughly 70% for training and...
  • Dataset

    Spotify Tracks Dataset (full)

    The dataset is created exploiting the Spotify API and the tracks id provided by the authors of https://www.kaggle.com/datasets/maharshipandya/-spotify-tracks-dataset.... The...
    • The resource: 'std_full' is not accessible as guest user. You must login to access it!
  • Dataset

    Spotify track dataset (small)

    The dataset is created exploiting the Spotify API and the tracks id provided by the authors of https://www.kaggle.com/datasets/maharshipandya/-spotify-tracks-dataset.... The...
    • ZIP
      The resource: 'std_small' is not accessible as guest user. You must login to access it!
  • Dataset

    SWH Filenames

    A 69 GB dataset with ~2.3 billion strings representing deduplicated names of source code files collected by Software Heritage, the great library of source code...
    • ZIP
      The resource: 'SWH Filenames' is not accessible as guest user. You must login to access it!
  • Access required...

    ×

    Dataset

    Private Smart Cities Weather and Pollution conditions

    A set of weather and climatic conditions gathered during the Toolsmart PoN project ( Open Community PA 2020 – Pon Governance 2014-2020). Data are obtained from IoT based...
  • Dataset

    GiveMeSomeCreditSC

    The GiveMeSomeCredit dataset - https://www.kaggle.com/c/GiveMeSomeCredit - contains different features of borrowers. The task is predicting the financial distress of a...
    • ZIP
      The resource: 'GiveMeSomeCreditSC' is not accessible as guest user. You must login to access it!
  • Dataset

    Santorini Tweets July-August 2021

    This dataset contains 225.501 tweets written by 141.277 users. These tweets are geolocated in Santorini, or they contain the word or the hashtag "santorini" in the text. They...
    • ZIP
      The resource: 'tweet_santorini.csv' is not accessible as guest user. You must login to access it!
  • Access required...

    ×

    Dataset

    Private Post-earthquake Reconstruction Progress Datasets over L'Aquila City

    Reconstruction data sets, provided by the National Public Entities of USRA and USRC. These data sets are stored in CSV files and provide comprehensive information related to...
  • Dataset

    HANSEN: Spoken Text Authorship Analysis

    HANSEN encom- passes meticulous curation of existing speech datasets accompanied by transcripts, along- side the creation of novel AI-generated spo- ken text datasets....
    • The resource: 'Datasets' is not accessible as guest user. You must login to access it!
You can also access this registry using the API (see API Docs).