Back to papers
Sieve: A Learned Data-Skipping Index for Data Analytics
Summary: Sieve is a learned data-skipping index that models block-level value distributions with piecewise-linear functions to capture real-world patterns missed by per-block min/max or histograms. By grouping adjacent keys into regions and trading storage for fewer false positives, Sieve cuts blocks accessed up to 80% and query time by 42% in Presto evaluations.
(summarized by gpt-5-mini on Feb 09 2026)
- Paper ID
- 13158
- Venue
- VLDB
- Year
- 2023
- Pagerank
- 4.5555621e-05
- Overall Rank
- 8,222 | 42.81%
- DOI
-
10.14778/3611479.3611520
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 4 of 4 citing papers.
Outgoing Citations (Sorted by Pagerank)
Showing 20 of 20 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 102 |
The Case for Learned Index Structures |
2018 |
SIGMOD |
0.00049545203 |
| 241 |
DB2 with BLU Acceleration: So Much More than Just a Column Store |
2013 |
VLDB |
0.00031420034 |
| 368 |
Small Materialized Aggregates: A Light Weight Index Structure for Data Warehousing |
1998 |
VLDB |
0.000254931 |
| 826 |
ALEX: An Updatable Adaptive Learned Index |
2020 |
SIGMOD |
0.00016224841 |
| 857 |
The PGM-index: a fully-dynamic compressed learned index with provable worst-case bounds |
2020 |
VLDB |
0.00015882892 |
| 1,375 |
FITing-Tree: A Data-aware Index Structure |
2019 |
SIGMOD |
0.00012303141 |
| 1,913 |
BF-Tree: Approximate Tree Indexing |
2014 |
VLDB |
0.00010113937 |
| 1,989 |
Column Imprints: A Secondary Index Structure |
2013 |
SIGMOD |
9.8478437e-05 |
| 2,140 |
Online Piece-wise Linear Approximation of Numerical Streams with Precision Guarantees* |
2009 |
VLDB |
9.4626098e-05 |
| 3,152 |
AnalyticDB: Real-time OLAP Database System at Alibaba Cloud |
2019 |
VLDB |
7.4711766e-05 |
| 3,608 |
Column Sketches: A Scan Accelerator for Rapid and Robust Predicate Evaluation |
2018 |
SIGMOD |
6.924272e-05 |
| 3,737 |
Skipping-oriented Partitioning for Columnar Layouts |
2017 |
VLDB |
6.8033227e-05 |
| 3,891 |
Slalom: Coasting Through Raw Data via Adaptive Partitioning and Indexing |
2017 |
VLDB |
6.659442e-05 |
| 3,912 |
Two Birds, One Stone: A Fast, yet Lightweight, Indexing Scheme for Modern Database Systems |
2017 |
VLDB |
6.6354964e-05 |
| 3,922 |
Pushing Data-Induced Predicates Through Joins in Big-Data Clusters |
2020 |
VLDB |
6.6291079e-05 |
| 4,158 |
Performance-Optimal Filtering: Bloom Overtakes Cuckoo at High Throughput |
2019 |
VLDB |
6.3994318e-05 |
| 5,315 |
Cuckoo Index: A Lightweight Secondary Index Structure |
2020 |
VLDB |
5.5723424e-05 |
| 5,428 |
The Price of Tailoring the Index to Your Data: Poisoning Attacks on Learned Index Structures |
2022 |
SIGMOD |
5.5091613e-05 |
| 6,850 |
Petabyte Scale Databases and Storage Systems at Facebook |
2013 |
SIGMOD |
4.9085019e-05 |
| 9,665 |
Fingerprints for Compressed Columnar Data Search |
2019 |
SIGMOD |
4.3082524e-05 |
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 4,994 |
Stacked Filters: Learning to Filter by Structure |
2021 |
VLDB |
5.78027e-05 |
| 8,076 |
Accelerating String-key Learned Index Structures via Memoization-based Incremental Training |
2024 |
VLDB |
4.5917398e-05 |
| 6,809 |
Adaptive Data Skipping in Main-Memory Systems |
2016 |
SIGMOD |
4.9206606e-05 |
| 5,315 |
Cuckoo Index: A Lightweight Secondary Index Structure |
2020 |
VLDB |
5.5723424e-05 |
| 11,993 |
A Partitioning Framework for Aggressive Data Skipping |
2014 |
VLDB |
4.1945683e-05 |
| 10,385 |
Optimizing Block Skipping for High-Dimensional Data with Learned Adaptive Curve |
2025 |
SIGMOD |
4.1945683e-05 |
| 9,373 |
S3: A Scalable In-memory Skip-List Index for Key-Value Store |
2019 |
VLDB |
4.3479874e-05 |
| 8,411 |
Sieve: A Middleware Approach to Scalable Access Control for Database Management Systems |
2020 |
VLDB |
4.5204669e-05 |
| 10,761 |
SIEVE: Effective Filtered Vector Search with Collection of Indexes |
2025 |
VLDB |
4.1945683e-05 |
| 10,034 |
SieveSketch: A Fine-grained and Adaptive Sketch Framework for Accurate Frequency Estimation |
2026 |
SIGMOD |
4.1945683e-05 |