20 items found

Tags: Web mining

Filter Results
  • Dataset

    Common Crawl Financial News Dataset

    This dataset contains financial articles related to companies in the S&P500 index for the period from September 2016 to February 2020. The articles were extracted from the...
    • CSV
      The resource: 'Common_Crawl_Financial_News' is not accessible as guest user. You must login to access it!
  • Dataset

    SWH Filenames

    A 69 GB dataset with ~2.3 billion strings representing deduplicated names of source code files collected by Software Heritage, the great library of source code...
    • ZIP
      The resource: 'SWH Filenames' is not accessible as guest user. You must login to access it!
  • Dataset

    FAIR-SWENG: dataset on gender fairness in software engineering academic lands...

    The dataset contains academic performance metrics of Software Engineers worldwide.
  • TrainingMaterial

    Introduction to Data Curation

    This course is an introduction to data collection, data preparation & transformation and data analysis. It contains the essential concepts for a researcher in order to...
    • PDF
      The resource: 'Introduction to Data Curation' is not accessible as guest user. You must login to access it!
  • Method

    GATE Cloud URL Domain Analysis

    Service that takes a list of URLs and assigns to each information on what multiple organisations who analyse the credibility of online content have said about the domain (or...
    • method-engine
      The resource: 'Method Engine' is not accessible as guest user. You must login to access it!
  • Method

    Python library for direct and indirect discrimination prevention in data mining

    This python library implements the discrimination discovery and prevention method proposed in the paper: “A methodology for direct and indirect discrimination prevention in...
    • GitHub
      The resource: 'Link to library' is not accessible as guest user. You must login to access it!
  • Experiment

    Forecasting the market value of soccer players from soccer-logs and social me...

    This experiments aims to develop a methodology to monitor and predict the market value of professional soccer players given their performance computed from soccer-logs and...
    • PDF
      The resource: 'Misinformation Detection ...' is not accessible as guest user. You must login to access it!
    • PDF
      The resource: 'DEAP-FAKED: Knowledge ...' is not accessible as guest user. You must login to access it!
    • PDF
      The resource: 'Research Article' is not accessible as guest user. You must login to access it!
  • Method

    Quantum Distance-Based Classifier

    The Quantum Distance-Based Classifier is a technique inspired by the classical k-Nearest Neighbors that leverages quantum properties to perform prediction.
  • Method

    Dictionary creator

    This tool creates a dictionary with inverse document frequency (idf) values from the Google NGrams dataset.
    • The resource: 'Source code' is not accessible as guest user. You must login to access it!
  • Dataset

    Wikipedia Word Embeddings

    Embeddings were created through applying word2vec skipgram to a corpus of wikipedia non-stub articles from a December 2015 English dump with the following parameters: -cbow 0...
    • The resource: 'Embeddings' is not accessible as guest user. You must login to access it!
  • Method

    Detecting Content That Triggers Polarization in Social Networks

    We provide a method that finds echo chambers in online social networks. The method considers controversial contents and finds users of the network who discuss these contents...
  • TrainingMaterial

    GATE Course

    The material is the 2017 version of a week-long training course delivered annually by the GATE team. Over almost ten years, this course has been developed to provide basic and...
    • PDF
      The resource: 'Module 1 - Introduction to ...' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'Module 1 - Hands-on materials' is not accessible as guest user. You must login to access it!
    • PDF
      The resource: 'Module 1 - Introduction to ...' is not accessible as guest user. You must login to access it!
    • PDF
      The resource: 'Module 1 - Introduction to ...' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'Module 1 - Hands-on ...' is not accessible as guest user. You must login to access it!
    • PDF
      The resource: 'Module 1 - Advanced JAPE' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'Module 1 - Hands-on ...' is not accessible as guest user. You must login to access it!
    • PDF
      The resource: 'Module 2 - Crowdsourcing ...' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'Module 2 - Hands-on ...' is not accessible as guest user. You must login to access it!
    • PDF
      The resource: 'Module 2 - GATE Mímir and ...' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'Module 2 - Hands-on ...' is not accessible as guest user. You must login to access it!
    • PDF
      The resource: 'Module 2 - Introduction to ...' is not accessible as guest user. You must login to access it!
    • PDF
      The resource: 'Module 2 - Classification ...' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'Module 2 - Classification ...' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'Module 2 - GATE ...' is not accessible as guest user. You must login to access it!
    • PDF
      The resource: 'Module 2 - Chunking - ...' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'Module 2 - Hands-on ...' is not accessible as guest user. You must login to access it!
    • PDF
      The resource: 'Module 3 - GATE and Social ...' is not accessible as guest user. You must login to access it!
    • PDF
      The resource: 'Module 3 - GATE and Social ...' is not accessible as guest user. You must login to access it!
    • PDF
      The resource: 'Module 3 - GATE and Social ...' is not accessible as guest user. You must login to access it!
    • PDF
      The resource: 'Module 3 - GATE and Social ...' is not accessible as guest user. You must login to access it!
    • PDF
      The resource: 'Module 3 - GATE and Social ...' is not accessible as guest user. You must login to access it!
    • PDF
      The resource: 'Module 3 - GATE and Social ...' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'Hands-on materials for ...' is not accessible as guest user. You must login to access it!
    • PDF
      The resource: 'Module 4 - Advanced GATE ...' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'Module 4 - Hands-on ...' is not accessible as guest user. You must login to access it!
    • PDF
      The resource: 'Module 4 - Opinion Mining' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'Module 4 - Hands-on ...' is not accessible as guest user. You must login to access it!
    • PDF
      The resource: 'Module 5 - The GATE ...' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'Module 5 - Hands-on ...' is not accessible as guest user. You must login to access it!
    • PDF
      The resource: 'Module 5 - Creating new ...' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'Module 5 - Hands-on ...' is not accessible as guest user. You must login to access it!
    • PDF
      The resource: 'Module 5 - Advanced GATE ...' is not accessible as guest user. You must login to access it!
    • PDF
      The resource: 'Module 5 - Advanced GATE ...' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'Module 5 - Hands-on ...' is not accessible as guest user. You must login to access it!
    • PDF
      The resource: 'Module 6 - Applications - ...' is not accessible as guest user. You must login to access it!
    • PDF
      The resource: 'Module 6 - Applications - ...' is not accessible as guest user. You must login to access it!
    • PDF
      The resource: 'Module 6 - Applications - ...' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'Module 6 - Hands-on ...' is not accessible as guest user. You must login to access it!
    • PDF
      The resource: 'Module 6 - Entity Linking' is not accessible as guest user. You must login to access it!
    • PDF
      The resource: 'Module 6 - JAPE Practical ...' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'Module 6 - Hands-on ...' is not accessible as guest user. You must login to access it!
    • PDF
      The resource: 'Module 6 - Summarisation ...' is not accessible as guest user. You must login to access it!
  • TrainingMaterial

    Data Mining and Machine Learning Module

    The module provides an introduction to base concepts of data mining and knowledge extraction process, introducing analytical models and algorithms for clustering,...
    • PDF
      The resource: 'Introduction' is not accessible as guest user. You must login to access it!
    • PDF
      The resource: 'Case Studies Outline' is not accessible as guest user. You must login to access it!
    • PDF
      The resource: 'Data Preparation and ...' is not accessible as guest user. You must login to access it!
    • PDF
      The resource: 'Clustering' is not accessible as guest user. You must login to access it!
    • PDF
      The resource: 'Classification' is not accessible as guest user. You must login to access it!
    • PDF
      The resource: 'Machine Learning and Data ...' is not accessible as guest user. You must login to access it!
    • PDF
      The resource: 'Fraud Detection' is not accessible as guest user. You must login to access it!
    • PDF
      The resource: 'Exemplar Projects on ...' is not accessible as guest user. You must login to access it!
  • TrainingMaterial

    High Performance and Scalable Analytics Module

    Mining with big data or big data mining has become an active research area. Running current analytical methodologies and software tools on a single personal computer cannot...
    • PDF
      The resource: 'Introduction to Parallel ...' is not accessible as guest user. You must login to access it!
    • PDF
      The resource: 'Introduction to Hadoop' is not accessible as guest user. You must login to access it!
    • PDF
      The resource: 'Hadoop Patterns' is not accessible as guest user. You must login to access it!
    • PDF
      The resource: 'Remote Connection and HDFS' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'Exercises for Remote ...' is not accessible as guest user. You must login to access it!
    • PDF
      The resource: 'Introduction to Spark' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'Exercises for Introduction ...' is not accessible as guest user. You must login to access it!
    • PDF
      The resource: 'Introduction to Spark SQL' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'Exercises for Introduction ...' is not accessible as guest user. You must login to access it!
    • PDF
      The resource: 'Hadoop Ecosystem and ...' is not accessible as guest user. You must login to access it!
    • PDF
      The resource: 'Data Mining with Spark (MLLIB)' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'Exercises for Data Mining ...' is not accessible as guest user. You must login to access it!
  • TrainingMaterial

    Archive Spark

    An Apache Spark framework for easy data processing, extraction as well as derivation for archival collections. Originally developed for the use with Web archives, it has now...
    • PDF
      The resource: 'Archive Spark Slides' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'Archive Spark Jupyter ...' is not accessible as guest user. You must login to access it!
  • TrainingMaterial

    Archive Crawling

    Web archives are typically very broad in scope and extremely large in scale. This makes data analysis appear daunting, especially for non-computer scientists. These...
    • PDF
      The resource: 'Archive Crawling Tutorial' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'Extracting Event-Centric ...' is not accessible as guest user. You must login to access it!
You can also access this registry using the API (see API Docs).