On the performance of learned data structures

A recent trend in algorithm design consists of augmenting classic data structures with machine learning models, which are better suited to reveal and exploit patterns and trends in the input data so to achieve outstanding practical improvements in space occupancy and time efficiency. This is especially known in the context of indexing data structures for big data where, despite few attempts in evaluating their asymptotic efficiency, theoretical results are yet missing in showing that learned indexes are provably better than classic indexes, such asB-trees and their variants. In this paper, we present the first mathematically-grounded answer to this problem by exploiting a link with a mean exit time problem over a proper stochastic process which, we show, is related to the space and time complexity of these learned indexes. As a corollary of this general analysis, we show that plugging this result in the (learned) PGM-index, we get a learned data structure which is provably better than B-trees.

Tags
Data and Resources
To access the resources you must log in
  • Link to Publication

    The resource: 'Link to Publication' is not accessible as guest user. You must login to access it!
Additional Info
Field Value
Creator Ferragina, Paolo paolo.ferragina@unipi.it
Creator Lillo, Fabrizio fabrizio.lillo@sns.it
Creator Vinciguerra, Giorgio giorgio.vinciguerra@phd.unipi.it
DOI https://doi.org/10.1016/j.tcs.2021.04.015
Group Others
Publisher Theoretical Computer Science (Elsevier)
Source Theoretical Computer Science 871 (2021) 107-120
Thematic Cluster Other
system:type JournalArticle
Management Info
Field Value
Author Joanna Wright
Maintainer Fabrizio Lillo
Version 1
Last Updated 24 May 2021, 11:17 (CEST)
Created 24 May 2021, 11:17 (CEST)