5 items found

Organisations: SoBigData Services and Products Tags: Web mining Information Retrieval

Filter Results
  • Dataset

    SWH Filenames

    A 69 GB dataset with ~2.3 billion strings representing deduplicated names of source code files collected by Software Heritage, the great library of source code...
    • ZIP
      The resource: 'SWH Filenames' is not accessible as guest user. You must login to access it!
  • Method

    Python library for direct and indirect discrimination prevention in data mining

    This python library implements the discrimination discovery and prevention method proposed in the paper: “A methodology for direct and indirect discrimination prevention in...
    • GitHub
      The resource: 'Link to library' is not accessible as guest user. You must login to access it!
  • Method

    Quantum Distance-Based Classifier

    The Quantum Distance-Based Classifier is a technique inspired by the classical k-Nearest Neighbors that leverages quantum properties to perform prediction.
  • Method

    Dictionary creator

    This tool creates a dictionary with inverse document frequency (idf) values from the Google NGrams dataset.
    • The resource: 'Source code' is not accessible as guest user. You must login to access it!
  • Dataset

    Wikipedia Word Embeddings

    Embeddings were created through applying word2vec skipgram to a corpus of wikipedia non-stub articles from a December 2015 English dump with the following parameters: -cbow 0...
    • The resource: 'Embeddings' is not accessible as guest user. You must login to access it!
You can also access this registry using the API (see API Docs).