4 items found

Organisations: SoBigData Services and Products Groups: sobigdata-it Tags: Text mining Information Retrieval

Filter Results
  • Dataset

    SWH Filenames

    A 69 GB dataset with ~2.3 billion strings representing deduplicated names of source code files collected by Software Heritage, the great library of source code...
    • ZIP
      The resource: 'SWH Filenames' is not accessible as guest user. You must login to access it!
  • Dataset

    FANCY Dataset

    (NLI) FANCY (FActivity, Negation, Common-sense, hYpernimy) is a new dataset with 4000 sentence pairs concerning complex linguistic phenomena such as factivity, negation,...
    • The resource: 'FANCY Dataset' is not accessible as guest user. You must login to access it!
  • Dataset

    Santorini Tweets July-August 2021

    This dataset contains 225.501 tweets written by 141.277 users. These tweets are geolocated in Santorini, or they contain the word or the hashtag "santorini" in the text. They...
    • ZIP
      The resource: 'tweet_santorini.csv' is not accessible as guest user. You must login to access it!
  • Access required...

    ×

    Dataset

    Private Italian Thesaurus for Tourism domain

    An Italian thesaurus in the domain of the Tourism, counting 2,684 concepts, organized according to semantic relationships (equivalence, hierarchical and associative). The...
You can also access this registry using the API (see API Docs).