Database Paper Browser

Back to papers

Approximate Query Processing: Taming the TeraBytes! A Tutorial

Summary: Survey of approximate query processing for terabytes, contrasting online aggregation with precomputed synopses for fast, bounded results. Covers multi-dimensional and join synopses, set-valued queries, AQUA-style rewrite, maintenance, streaming data. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
8824
Venue
VLDB
Year
2001
Pagerank
0.00022846068
Overall Rank
449 | 96.88%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 41 of 41 citing papers.

Rank Citing Paper Year Venue Pagerank
149 Trio: A System for Integrated Management of Data, Accuracy, and Lineage 2005 CIDR 0.00041101118
475 Mining Database Structure; Or, How to Build a Data Quality Browser 2002 SIGMOD 0.00022303253
477 Model-Driven Data Acquisition in Sensor Networks 2004 VLDB 0.00022221803
696 BlazeIt: Optimizing Declarative Aggregation and Limit Queries for Neural Network-Based Video Analytics 2020 VLDB 0.00018048935
905 The Design of an Acquisitional Query Processor For Sensor Networks 2003 SIGMOD 0.0001546195
1,064 Processing Complex Aggregate Queries over Data Streams 2002 SIGMOD 0.00014356481
1,152 Blink and It's Done: Interactive Queries on Very Large Data 2012 VLDB 0.00013645792
1,420 Data Management Challenges in Production Machine Learning 2017 SIGMOD 0.00012057956
1,574 Approximate Query Processing: No Silver Bullet 2017 SIGMOD 0.00011287495
2,011 Rapid Sampling for Visualizations with Ordering Guarantees 2015 VLDB 9.7964875e-05
2,118 Using Probabilistic Models for Data Management in Acquisitional Environments 2005 CIDR 9.5100739e-05
2,184 A Sample-and-Clean Framework for Fast and Accurate Query Processing on Dirty Data 2014 SIGMOD 9.3429789e-05
2,492 Partial Results for Online Query Processing 2002 SIGMOD 8.6526489e-05
2,580 Sample + Seek: Approximating Aggregates with Distribution Precision Guarantee 2016 SIGMOD 8.5058814e-05
2,863 Incremental and Approximate Inference for Faster Occlusion-based Deep CNN Explanations 2019 SIGMOD 7.9877991e-05
3,393 Lux: Always-on Visualization Recommendations for Exploratory Dataframe Workflows 2022 VLDB 7.1483239e-05
3,565 Cache-Craft: Managing Chunk-Caches for Efficient Retrieval-Augmented Generation 2025 SIGMOD 6.9655362e-05
3,835 I've Seen "Enough": Incrementally Improving Visualizations to Support Rapid Decision Making 2017 VLDB 6.7163364e-05
4,014 Exploiting Correlations for Expensive Predicate Evaluation 2015 SIGMOD 6.5273084e-05
4,668 PrivateClean: Data Cleaning and Differential Privacy 2016 SIGMOD 6.0115918e-05
4,716 Mining Graph Patterns Efficiently via Randomized Summaries 2009 VLDB 5.9755569e-05
4,909 A Method for Optimizing Opaque Filter Queries 2020 SIGMOD 5.8338804e-05
5,140 A Random Walk Approach to Sampling Hidden Databases 2007 SIGMOD 5.668209e-05
6,330 Efficient Construction of Approximate Ad-Hoc ML models Through Materialization and Reuse 2018 VLDB 5.1077416e-05
6,838 Capturing Data Uncertainty in High-Volume Stream Processing 2009 CIDR 4.9109732e-05
7,477 Benchmarking Spreadsheet Systems 2020 SIGMOD 4.7188671e-05
7,890 Mining a Search Engine’s Corpus: Efficient Yet Unbiased Sampling and Aggregate Estimation 2011 SIGMOD 4.6249533e-05
8,728 Stale View Cleaning: Getting Fresh Answers from Stale Materialized Views 2015 VLDB 4.4589711e-05
8,851 Efficient Approximations of Conjunctive Queries 2012 PODS 4.4363908e-05
9,432 Aggregate Estimation Over Dynamic Hidden Web Databases 2014 VLDB 4.3431757e-05
9,614 Auto-Approximation of Graph Computing 2014 VLDB 4.3177432e-05
9,992 Supporting Our AI Overlords: Redesigning Data Systems to be Agent-First 2026 CIDR 4.1945683e-05
10,049 Approximate Query Processing under Updates 2026 SIGMOD 4.1945683e-05
10,116 Stochastic Submodular Data Forgetting 2026 SIGMOD 4.1945683e-05
10,215 Task Cascades for Efficient Unstructured Data Processing 2026 SIGMOD 4.1945683e-05
10,886 FaDE: More Than a Million What-ifs Per Second 2025 VLDB 4.1945683e-05
11,502 In the Land of Data Streams where Synopses are Missing, One Framework to Bring Them All 2021 VLDB 4.1945683e-05
11,691 Enabling Data Science for the Majority 2019 VLDB 4.1945683e-05
11,832 A Study of Sorting Algorithms on Approximate Memory 2016 SIGMOD 4.1945683e-05
12,019 When Data Management Systems Meet Approximate Hardware: Challenges and Opportunities 2014 VLDB 4.1945683e-05
12,506 AQAX: A System for Approximate XML Query Answers 2006 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 50 of 52 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
1 Access Path Selection in a Relational Database Management System 1979 SIGMOD 0.0040449103
14 Online Aggregation 1997 SIGMOD 0.0010801504
18 On Random Sampling over Joins 1999 SIGMOD 0.00092385438
28 Accurate Estimation Of The Number Of Tuples Satisfying A Condition 1984 SIGMOD 0.00080435857
39 Statistical Estimators for Relational Algebra Expressions 1988 PODS 0.00074745564
59 Sampling-Based Estimation of the Number of Distinct Values of an Attribute 1995 VLDB 0.00064501896
64 Improved Histograms for Selectivity Estimation of Range Predicates 1996 SIGMOD 0.00063612837
92 Practical Selectivity Estimation through Adaptive Sampling 1990 SIGMOD 0.00051315959
99 On the Propagation of Errors in the Size of Join Results 1991 SIGMOD 0.00050022914
116 Equi-Depth Histograms For Estimating Selectivity Factors For Multi-Dimensional Queries 1988 SIGMOD 0.00046148737
126 Space-Efficient Online Computation of Quantile Summaries 2001 SIGMOD 0.00044744986
134 Processing Aggregate Relational Queries with Hard Time Constraints 1989 SIGMOD 0.00042452811
141 Selectivity Estimation Without the Attribute Value Independence Assumption 1997 VLDB 0.00041786333
184 New Sampling-Based Summary Statistics for Improving Approximate Query Answers 1998 SIGMOD 0.00036625711
211 Join Synopses for Approximate Query Answering 1999 SIGMOD 0.00033981214
217 Ripple Joins for Online Aggregation 1999 SIGMOD 0.00033536712
222 Wavelet-Based Histograms for Selectivity Estimation 1998 SIGMOD 0.00032828302
237 An Efficient, Cost-Driven Index Selection Tool for Microsoft SQL Server 1997 VLDB 0.00031726304
252 Adaptive Selectivity Estimation Using Query Feedback 1994 SIGMOD 0.00030632263
269 Fast Incremental Maintenance of Approximate Histograms 1997 VLDB 0.00029656549
273 Approximate Computation of Multidimensional Aggregates of Sparse Data Using Wavelets 1999 SIGMOD 0.00029390945
308 Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports 2001 VLDB 0.00028142852
326 Optimal Histograms with Quality Guarantees 1998 VLDB 0.00027358981
327 Balancing Histogram Optimality and Practicality for Query Result Size Estimation 1995 SIGMOD 0.00027308479
344 Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries 2001 VLDB 0.00026702512
361 Histogram-Based Approximation of Set-Valued Query Answers 1999 VLDB 0.00025775749
372 Selectivity Estimation using Probabilistic Models 2001 SIGMOD 0.00025354779
378 Towards Estimation Error Guarantees for Distinct Values 2000 PODS 0.0002497492
405 Approximate Query Processing Using Wavelets 2000 VLDB 0.00024057494
512 STHoles: A Multidimensional Workload-Aware Histogram 2001 SIGMOD 0.00021380733
516 AutoAdmin "What-if" Index Analysis Utility 1998 SIGMOD 0.00021196031
529 Self-tuning Histograms: Building Histograms Without Looking at Data 1999 SIGMOD 0.00020828852
530 Random Sampling for Histogram Construction: How much is enough? 1998 SIGMOD 0.00020803682
549 Tracking Join and Self-Join Sizes in Limited Storage 1999 PODS 0.00020376603
553 Bifocal Sampling for Skew-Resistant Join Size Estimation 1996 SIGMOD 0.00020272061
619 On Computing Correlated Aggregates Over Continual Data Streams 2001 SIGMOD 0.00019066583
647 Progressive Approximate Aggregate Queries with a Multi-Resolution Tree Structure 2001 SIGMOD 0.00018668224
693 Efficiently Supporting Ad Hoc Queries in Large Datasets of Time Sequences 1997 SIGMOD 0.00018077335
739 Congressional Samples for Approximate Answering of Group-By Queries 2000 SIGMOD 0.00017401518
808 Universality of Serial Histograms 1993 VLDB 0.00016432772
842 Independence is Good: Dependency-Based Histogram Synopses for High-Dimensional Data 2001 SIGMOD 0.00016031973
996 Approximating Multi-Dimensional Aggregate Range Queries Over Real Attributes 2000 SIGMOD 0.00014741524
1,127 Dynamic Maintenance of Wavelet-Based Histograms 2000 VLDB 0.00013819179
1,241 Multi-dimensional Selectivity Estimation Using Compressed Histogram Information 1999 SIGMOD 0.00013097578
1,335 ICICLES: Self-tuning Samples for Approximate Query Answering 2000 VLDB 0.00012502131
1,598 Semantic Compression and Pattern Extraction with Fascicles 1999 VLDB 0.00011202905
1,695 Combining Histograms and Parametric Curve Fitting for Feedback-Driven Query Result-Size Estimation 1999 VLDB 0.00010882793
2,808 A Robust, Optimization-Based Approach for Approximate Answering of Aggregate Queries 2001 SIGMOD 8.0870741e-05
2,908 SPARTAN: A Model-Based Semantic Compression System for Massive Data Tables 2001 SIGMOD 7.9306333e-05
3,310 Optimal and Approximate Computation of Summary Statistics for Range Aggregates 2001 PODS 7.2408955e-05
Previous Page 1 / 2 Next

Semantically Similar Papers