approved
DNA 12-mers

A 179 MB dataset containing all the ~14M unique 12-mers in the DNA sequences available in the Pizza&Chili Corpus (https://pizzachili.dcc.uchile.cl/texts.html).

This dataset has been used for the evaluation of learned string indexes (https://doi.org/10.1109/ACCESS.2023.3295434).

Number of newline-separated strings: 13745061 Size of the zip-compressed dataset: 30773793 bytes (30.77 MB) Size of the uncompressed dataset: 178685793 bytes (178.69 MB) Encoding: ASCII

Tags
Data and Resources
To access the resources you must log in
  • DNA 12-mersZIP

    The resource: 'DNA 12-mers' is not accessible as guest user. You must login to access it!
Personal Data Attributes

Description: Personal Data related Information

Field Value
Anonymised No
ChildrenData No
Cross Border Authorised Yes
Data Protection Impact Assessment No
Ethics Committee Approval No
General Data Yes
Informed Consent Template No
Personal Data No
Personal data was manifestly made public by the data subject No
Sensitive Data No
Additional Info
Field Value
Accessibility Both
Accessibility Mode Download
Availability On-Line
Basic rights Download
Basic rights Distribution
Basic rights Modification
Creation Date 2023-07-14 12:00
Creator Vinciguerra, Giorgio, giorgio.vinciguerra@unipi.it, orcid.org/0000-0003-0328-7791
Dataset Citation https://doi.org/10.1109/ACCESS.2023.3295434
Dataset Re-Use Safeguards /
DiskSize 30.77
Field/Scope of use Any use
Format txt
Group Health Studies
Language eng, English
License term 2023-11-30 09:00/2999-12-31 23:59
Manifestation Type Original
Processing Degree Secondary
Retention Period 2023-11-30 /2999-12-31 23:59
SoBigData Node SoBigData EU
SoBigData Node SoBigData IT
Sublicense rights No
Territory of use World Wide
Thematic Cluster Other
system:type Dataset
Management Info
Field Value
Author Vinciguerra Giorgio
Maintainer Vinciguerra Giorgio
Version 1
Last Updated 1 December 2023, 09:10 (CET)
Created 30 November 2023, 18:35 (CET)