Back to papers
Lightweight Cardinality Estimation in LSM-based Systems
Summary: Lightweight statistics for LSM stores; piggybacks on flush/merge to stay updated under high ingestion. Uses equi-width/equi-height histograms and wavelets for cardinality estimates, implemented on Apache AsterixDB with accuracy and overhead evaluation.
(summarized by gpt-5-nano on Feb 09 2026)
- Paper ID
- 5481
- Venue
- SIGMOD
- Year
- 2018
- Pagerank
- 5.4539235e-05
- Overall Rank
- 5,535 | 61.50%
- DOI
-
10.1145/3183713.3183761
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 12 of 12 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
| 4,833 |
MNC: Structure-Exploiting Sparsity Estimation for Matrix Expressions |
2019 |
SIGMOD |
5.8916346e-05 |
| 6,231 |
An LSM-based Tuple Compaction Framework for Apache AsterixDB |
2020 |
VLDB |
5.1457863e-05 |
| 6,398 |
Endure: A Robust Tuning Paradigm for LSM Trees Under Workload Uncertainty |
2022 |
VLDB |
5.0819209e-05 |
| 7,271 |
Comparing Synopsis Techniques for Approximate Spatial Data Analysis |
2019 |
VLDB |
4.7813404e-05 |
| 7,620 |
Learning to Optimize LSM-trees: Towards A Reinforcement Learning based Key-Value Store for Dynamic Workloads |
2023 |
SIGMOD |
4.693568e-05 |
| 8,009 |
CAMAL: Optimizing LSM-trees via Active Learning |
2024 |
SIGMOD |
4.6066863e-05 |
| 8,339 |
How to Grow an LSM-tree? Towards Bridging the Gap Between Theory and Practice |
2025 |
SIGMOD |
4.5434069e-05 |
| 8,805 |
ArceKV: Towards Workload-driven LSM-compactions for Key-Value Store Under Dynamic Workloads |
2026 |
VLDB |
4.4466855e-05 |
| 9,237 |
Determining Exact Quantiles with Randomized Summaries |
2024 |
SIGMOD |
4.3690661e-05 |
| 9,317 |
Are Joins over LSM-trees Ready? Take RocksDB as an Example |
2025 |
VLDB |
4.3556432e-05 |
| 9,386 |
Rethinking The Compaction Policies in LSM-trees |
2025 |
SIGMOD |
4.3455975e-05 |
| 10,388 |
Randomized Sketches for Quantile in LSM-tree based Store |
2025 |
SIGMOD |
4.1945683e-05 |
Outgoing Citations (Sorted by Pagerank)
Showing 24 of 24 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 1 |
Access Path Selection in a Relational Database Management System |
1979 |
SIGMOD |
0.0040449103 |
| 64 |
Improved Histograms for Selectivity Estimation of Range Predicates |
1996 |
SIGMOD |
0.00063612837 |
| 70 |
Hive - A Warehousing Solution Over a Map-Reduce Framework |
2009 |
VLDB |
0.00059533166 |
| 71 |
How Good Are Query Optimizers, Really? |
2016 |
VLDB |
0.00059038975 |
| 99 |
On the Propagation of Errors in the Size of Join Results |
1991 |
SIGMOD |
0.00050022914 |
| 126 |
Space-Efficient Online Computation of Quantile Summaries |
2001 |
SIGMOD |
0.00044744986 |
| 182 |
LEO - DB2's LEarning Optimizer |
2001 |
VLDB |
0.00036962631 |
| 222 |
Wavelet-Based Histograms for Selectivity Estimation |
1998 |
SIGMOD |
0.00032828302 |
| 327 |
Balancing Histogram Optimality and Practicality for Query Result Size Estimation |
1995 |
SIGMOD |
0.00027308479 |
| 344 |
Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries |
2001 |
VLDB |
0.00026702512 |
| 367 |
Sequential Sampling Procedures For Query Size Estimation |
1992 |
SIGMOD |
0.00025509745 |
| 405 |
Approximate Query Processing Using Wavelets |
2000 |
VLDB |
0.00024057494 |
| 429 |
The Aqua Approximate Query Answering System |
1999 |
SIGMOD |
0.00023476494 |
| 454 |
An Overview of Query Optimization in Relational Systems |
1998 |
PODS |
0.00022734812 |
| 476 |
Impala: A Modern, Open-Source SQL Engine for Hadoop |
2015 |
CIDR |
0.00022226941 |
| 512 |
STHoles: A Multidimensional Workload-Aware Histogram |
2001 |
SIGMOD |
0.00021380733 |
| 529 |
Self-tuning Histograms: Building Histograms Without Looking at Data |
1999 |
SIGMOD |
0.00020828852 |
| 1,127 |
Dynamic Maintenance of Wavelet-Based Histograms |
2000 |
VLDB |
0.00013819179 |
| 1,438 |
AsterixDB: A Scalable, Open Source BDMS |
2014 |
VLDB |
0.00011973592 |
| 2,021 |
Storage Management in AsterixDB |
2014 |
VLDB |
9.7601304e-05 |
| 3,013 |
Cardinality Estimation Using Sample Views with Quality Assurance |
2007 |
SIGMOD |
7.7137441e-05 |
| 3,066 |
HAWQ: A Massively Parallel Processing SQL Engine in Hadoop |
2014 |
SIGMOD |
7.6221974e-05 |
| 5,262 |
SnappyData: A Hybrid Transactional Analytical Store Built On Spark |
2016 |
SIGMOD |
5.5977349e-05 |
| 7,415 |
Efficient and Scalable Statistics Gathering for Large Databases in Oracle 11g |
2008 |
SIGMOD |
4.7355557e-05 |
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 2,021 |
Storage Management in AsterixDB |
2014 |
VLDB |
9.7601304e-05 |
| 8,731 |
Columnar Formats for Schemaless LSM-based Document Stores |
2022 |
VLDB |
4.4577278e-05 |
| 10,388 |
Randomized Sketches for Quantile in LSM-tree based Store |
2025 |
SIGMOD |
4.1945683e-05 |
| 7,218 |
Breaking Down Memory Walls in LSM-based Storage Systems |
2020 |
SIGMOD |
4.7982543e-05 |
| 5,791 |
Dissecting, Designing, and Optimizing LSM-based Data Stores |
2022 |
SIGMOD |
5.3268999e-05 |
| 11,356 |
Workload-Adaptive Filtering in Storage Engines |
2022 |
SIGMOD |
4.1945683e-05 |
| 5,918 |
Breaking Down Memory Walls: Adaptive Memory Management in LSM-based Storage Systems |
2021 |
VLDB |
5.2737135e-05 |
| 6,231 |
An LSM-based Tuple Compaction Framework for Apache AsterixDB |
2020 |
VLDB |
5.1457863e-05 |
| 4,914 |
On Performance Stability in LSM-based Storage Systems |
2020 |
VLDB |
5.8315684e-05 |
| 7,743 |
Efficient Data Ingestion and Query Processing for LSM-Based Storage Systems |
2019 |
VLDB |
4.6626575e-05 |