Database Paper Browser

Back to papers

Sample + Seek: Approximating Aggregates with Distribution Precision Guarantee

Summary: Proposes distribution precision as a strict error guarantee for AQP of group-by aggregates, enabling distribution-level accuracy rather than point estimates. Introduces measure-biased sampling and two in-memory indexes to support selective predicates and any aggregate estimate, delivering ~100x speedups with ~5% distribution error vs a commercial DB. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
5276
Venue
SIGMOD
Year
2016
Pagerank
8.5058814e-05
Overall Rank
2,580 | 82.06%
DOI
10.1145/2882903.2915249

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 29 of 29 citing papers.

Rank Citing Paper Year Venue Pagerank
1,369 Random Sampling over Joins Revisited 2018 SIGMOD 0.00012339777
1,427 Towards Scalable Dataframe Systems 2020 VLDB 0.0001204248
1,574 Approximate Query Processing: No Silver Bullet 2017 SIGMOD 0.00011287495
3,702 Every Row Counts: Combining Sketches and Sampling for Accurate Group-By Result Estimates 2019 CIDR 6.8295759e-05
3,944 AQP++: Connecting Approximate Query Processing With Aggregate Precomputation for Interactive Analytics 2018 SIGMOD 6.6078243e-05
4,375 Sample Debiasing in the Themis Open World Database System 2020 SIGMOD 6.2427076e-05
4,536 Data Series Progressive Similarity Search with Probabilistic Quality Guarantees 2020 SIGMOD 6.104642e-05
4,681 Adaptive Sampling for Rapidly Matching Histograms 2018 VLDB 6.0034918e-05
5,909 At-the-time and Back-in-time Persistent Sketches 2021 SIGMOD 5.2769377e-05
6,296 Visualization-aware Time Series Min-Max Caching with Error Bound Guarantees 2024 VLDB 5.1249171e-05
6,298 Hillview: A trillion-cell spreadsheet for big data 2019 VLDB 5.1226987e-05
6,740 Combining Aggregation and Sampling (Nearly) Optimally for Approximate Query Processing 2021 SIGMOD 4.944395e-05
6,842 Towards Democratizing Relational Data Visualization 2019 SIGMOD 4.9103931e-05
6,907 Continuous Prefetch for Interactive Data Applications 2020 VLDB 4.8925595e-05
7,073 Marviq: Quality-Aware Geospatial Visualization of Range-Selection Queries Using Materialization 2020 SIGMOD 4.842703e-05
7,395 MOST: Model-Based Compression with Outlier Storage for Time Series Data 2023 SIGMOD 4.7420041e-05
7,872 Probabilistic Database Summarization for Interactive Data Exploration 2017 VLDB 4.6307184e-05
8,080 Biathlon: Harnessing Model Resilience for Accelerating ML Inference Pipelines 2024 VLDB 4.5911668e-05
8,240 Experiences with Approximating Queries in Microsoft’s Production Big-Data Clusters 2019 VLDB 4.5522563e-05
8,643 One Size Does Not Fit All: A Bandit-Based Sampler Combination Framework with Theoretical Guarantees 2022 SIGMOD 4.4777916e-05
8,715 Data Driven Approximation with Bounded Resources 2017 VLDB 4.4619052e-05
9,621 ShadowAQP: Efficient Approximate Group-by and Join Query via Attribute-oriented Sample Size Allocation and Data Generation 2023 VLDB 4.3167167e-05
9,758 Practical Dynamic Extension for Sampling Indexes 2023 SIGMOD 4.2879116e-05
9,949 AB-tree: Index for Concurrent Random Sampling and Updates 2022 VLDB 4.2421586e-05
10,127 Visualization-Oriented Progressive Time Series Transformation 2026 SIGMOD 4.1945683e-05
10,254 Secure Multi-Party Sampling over Joins 2026 VLDB 4.1945683e-05
10,481 FAAQP: Fast and Accurate Approximate Query Processing based on Bitmap-augmented Sum-Product Network 2025 SIGMOD 4.1945683e-05
11,285 Approximate Queries over Concurrent Updates 2023 VLDB 4.1945683e-05
11,539 FlashP: An Analytical Pipeline for Real-time Forecasting of Time-Series Relational Data 2021 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 20 of 20 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
14 Online Aggregation 1997 SIGMOD 0.0010801504
158 Automated Selection of Materialized Views and Indexes for SQL Databases 2000 VLDB 0.00040071492
273 Approximate Computation of Multidimensional Aggregates of Sparse Data Using Wavelets 1999 SIGMOD 0.00029390945
378 Towards Estimation Error Guarantees for Distinct Values 2000 PODS 0.0002497492
405 Approximate Query Processing Using Wavelets 2000 VLDB 0.00024057494
429 The Aqua Approximate Query Answering System 1999 SIGMOD 0.00023476494
449 Approximate Query Processing: Taming the TeraBytes! A Tutorial 2001 VLDB 0.00022846068
739 Congressional Samples for Approximate Answering of Group-By Queries 2000 SIGMOD 0.00017401518
1,260 Dynamic Sample Selection for Approximate Query Processing 2003 SIGMOD 0.00012993347
1,335 ICICLES: Self-tuning Samples for Approximate Query Answering 2000 VLDB 0.00012502131
1,464 Online Aggregation for Large MapReduce Jobs 2011 VLDB 0.00011865546
1,874 Knowing When You’re Wrong: Building Fast and Reliable Approximate Query Processing Systems 2014 SIGMOD 0.00010244443
1,909 SciBORQ: Scientific data management with Bounds On Runtime and Quality 2011 CIDR 0.00010121304
2,011 Rapid Sampling for Visualizations with Ordering Guarantees 2015 VLDB 9.7964875e-05
2,365 The Analytical Bootstrap: a New Method for Fast Error Estimation in Approximate Query Processing 2014 SIGMOD 8.9551432e-05
2,616 DAQ: A New Paradigm for Approximate Query Processing 2015 VLDB 8.4471955e-05
2,808 A Robust, Optimization-Based Approach for Approximate Answering of Aggregate Queries 2001 SIGMOD 8.0870741e-05
3,310 Optimal and Approximate Computation of Summary Statistics for Range Aggregates 2001 PODS 7.2408955e-05
4,052 Interactive Analysis of Web-Scale Data 2009 CIDR 6.4936745e-05
5,879 Fast and Near–Optimal Algorithms for Approximating Distributions by Histograms 2015 PODS 5.2908101e-05
Previous Page 1 / 1 Next

Semantically Similar Papers