approved
DNA 31-mers

A 12 GB dataset containing all the ~367M unique 31-mers in the DNA sequences available in the Pizza&Chili Corpus (https://pizzachili.dcc.uchile.cl/texts.html).

This dataset has been used for the evaluation of learned monotone minimal perfect hash functions (https://doi.org/10.4230/LIPIcs.ESA.2023.46).

Number of newline-separated strings: 367422516 Size of the zip-compressed dataset: 2538252372 bytes (2.54 GB) Size of the uncompressed dataset: 11757520543 bytes (11.76 GB) Encoding: ASCII

Tags
Data and Resources
To access the resources you must log in
  • DNA 31-mersZIP

    The resource: 'DNA 31-mers' is not accessible as guest user. You must login to access it!
Personal Data Attributes

Description: Personal Data related Information

Field Value
Anonymised No
ChildrenData No
Cross Border Authorised Yes
Data Protection Impact Assessment No
Ethics Committee Approval No
General Data Yes
Informed Consent Template No
Personal Data No
Personal data was manifestly made public by the data subject No
Sensitive Data No
Additional Info
Field Value
Accessibility Both
Accessibility Mode OnLine Access
Accessibility Mode Download
Availability On-Line
Basic rights Download
Basic rights Modification
Creation Date 2023-08-30 09:00
Creator Vinciguerra, Giorgio, giorgio.vinciguerra@unipi.it, orcid.org/0000-0003-0328-7791
Dataset Citation https://doi.org/10.4230/LIPIcs.ESA.2023.46
Dataset Re-Use Safeguards /
Field/Scope of use Any use
Group Health Studies
Language eng, English
License term 2023-11-30 09:00/2999-12-31 23:59
Manifestation Type Virtual
Processing Degree Secondary
Retention Period 2023-11-30 09:00/2999-12-31 23:59
SoBigData Node SoBigData IT
Sublicense rights No
Territory of use World Wide
Thematic Cluster Other
Time Coverage 2023-11-30 09:00/2999-12-31 23:59
system:type Dataset
Management Info
Field Value
Author Vinciguerra Giorgio
Maintainer Vinciguerra Giorgio
Version 1
Last Updated 6 December 2023, 10:46 (CET)
Created 30 November 2023, 18:56 (CET)