Database Paper Browser

Back to papers

Online Aggregation for Large MapReduce Jobs

Summary: Brings online aggregation to MapReduce, providing progressive estimates and confidence bounds during large-scale aggregations. Enables pay-as-you-go cloud processing by early stopping when accuracy suffices, reducing cost for big data jobs. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
10176
Venue
VLDB
Year
2011
Pagerank
0.00011865546
Overall Rank
1,464 | 89.82%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 39 of 39 citing papers.

Rank Citing Paper Year Venue Pagerank
943 Wander Join: Online Aggregation via Random Walks 2016 SIGMOD 0.00015145883
1,204 VerdictDB: Universalizing Approximate Query Processing 2018 SIGMOD 0.00013319541
1,323 Quickr: Lazily Approximating Complex AdHoc Queries in BigData Clusters 2016 SIGMOD 0.00012601997
1,369 Random Sampling over Joins Revisited 2018 SIGMOD 0.00012339777
1,574 Approximate Query Processing: No Silver Bullet 2017 SIGMOD 0.00011287495
1,840 dbTouch: Analytics at your Fingertips 2013 CIDR 0.0001034905
1,874 Knowing When You’re Wrong: Building Fast and Reliable Approximate Query Processing Systems 2014 SIGMOD 0.00010244443
2,184 A Sample-and-Clean Framework for Fast and Accurate Query Processing on Dirty Data 2014 SIGMOD 9.3429789e-05
2,355 G-OLA: Generalized On-Line Aggregation for Interactive Analysis on Big Data 2015 SIGMOD 8.9677847e-05
2,365 The Analytical Bootstrap: a New Method for Fast Error Estimation in Approximate Query Processing 2014 SIGMOD 8.9551432e-05
2,501 DBEst: Revisiting Approximate Query Processing Engines with Machine Learning Models 2019 SIGMOD 8.6453446e-05
2,580 Sample + Seek: Approximating Aggregates with Distribution Precision Guarantee 2016 SIGMOD 8.5058814e-05
2,588 Database Learning: Toward a Database that Becomes Smarter Every Time 2017 SIGMOD 8.4909562e-05
2,674 Minimal MapReduce Algorithms 2013 SIGMOD 8.3328645e-05
3,034 How to Fit when No One Size Fits 2013 CIDR 7.6752083e-05
3,051 Partial Results in Database Systems 2014 SIGMOD 7.6512591e-05
3,279 Early Accurate Results for Advanced Analytics on MapReduce 2012 VLDB 7.2855494e-05
3,798 Plato: Approximate Analytics over Compressed Time Series with Tight Deterministic Error Guarantees 2020 VLDB 6.7592302e-05
3,944 AQP++: Connecting Approximate Query Processing With Aggregate Precomputation for Interactive Analytics 2018 SIGMOD 6.6078243e-05
4,029 Spatial Online Sampling and Aggregation 2016 VLDB 6.51315e-05
5,014 Dynamically Optimizing Queries over Large Scale Data Platforms 2014 SIGMOD 5.7586174e-05
5,224 Neighbor-Sensitive Hashing 2016 VLDB 5.6197981e-05
5,252 Error-bounded Sampling for Analytics on Big Sparse Data 2014 VLDB 5.6024389e-05
5,868 ABS: a System for Scalable Approximate Queries with Accuracy Guarantees 2014 SIGMOD 5.2959352e-05
6,136 Scalable Progressive Analytics on Big Data in the Cloud 2013 VLDB 5.1928748e-05
6,298 Hillview: A trillion-cell spreadsheet for big data 2019 VLDB 5.1226987e-05
6,311 VergeDB: A Database for IoT Analytics on Edge Devices 2021 CIDR 5.1161316e-05
6,400 iOLAP: Managing Uncertainty for Efficient Incremental OLAP 2016 SIGMOD 5.0803518e-05
6,411 Approximate Query Engines: Commercial Challenges and Research Opportunities 2017 SIGMOD 5.0752468e-05
6,493 Joins on Samples: A Theoretical Guide for Practitioners 2020 VLDB 5.0424713e-05
7,534 Enabling Efficient and General Subpopulation Analytics in Multidimensional Data Streams 2022 VLDB 4.7180004e-05
8,643 One Size Does Not Fit All: A Bandit-Based Sampler Combination Framework with Theoretical Guarantees 2022 SIGMOD 4.4777916e-05
9,384 Sapprox: Enabling Efficient and Accurate Approximations on Sub-datasets with Distribution-aware Online Sampling 2017 VLDB 4.3456129e-05
10,497 PilotDB: Database-Agnostic Online Approximate Query Processing with A Priori Error Guarantees 2025 SIGMOD 4.1945683e-05
10,981 Enabling Adaptive Sampling for Intra-Window Join: Simultaneously Optimizing Quantity and Quality 2024 SIGMOD 4.1945683e-05
11,194 A Step Toward Deep Online Aggregation 2023 SIGMOD 4.1945683e-05
11,539 FlashP: An Analytical Pipeline for Real-time Forecasting of Time-Series Relational Data 2021 VLDB 4.1945683e-05
11,711 Demonstration of VerdictDB, the Platform-Independent AQP System 2018 SIGMOD 4.1945683e-05
11,913 STORM: Spatio-Temporal Online Reasoning and Management of Large Spatio-Temporal Data 2015 SIGMOD 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 6 of 6 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
14 Online Aggregation 1997 SIGMOD 0.0010801504
184 New Sampling-Based Summary Statistics for Improving Approximate Query Answers 1998 SIGMOD 0.00036625711
217 Ripple Joins for Online Aggregation 1999 SIGMOD 0.00033536712
1,425 Scalable Approximate Query Processing With The DBO Engine 2007 SIGMOD 0.00012051353
4,093 Distributed Online Aggregations 2009 VLDB 6.4558147e-05
5,661 CONTROL: Continuous Output and Navigation Technology with Refinement On-Line 1998 SIGMOD 5.3840752e-05
Previous Page 1 / 1 Next

Semantically Similar Papers

Overall Rank Paper Year Venue Pagerank
3,062 Efficient Multi-way Theta-Join Processing Using MapReduce 2012 VLDB 7.6343994e-05
12,400 Ad-Hoc Data Processing in the Cloud 2008 VLDB 4.1945683e-05
3,129 Scalable Big Graph Processing in MapReduce 2014 SIGMOD 7.5008242e-05
15 Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters 2007 SIGMOD 0.0010654262
1,615 The Performance of MapReduce: An In-depth Study 2010 VLDB 0.00011132319
3,703 Multi-Query Optimization in MapReduce Framework 2014 VLDB 6.8289978e-05
2,476 A Platform for Scalable One-Pass Analytics using MapReduce 2011 SIGMOD 8.6960139e-05
2,674 Minimal MapReduce Algorithms 2013 SIGMOD 8.3328645e-05
14 Online Aggregation 1997 SIGMOD 0.0010801504
2,736 Online Aggregation and Continuous Query support in MapReduce 2010 SIGMOD 8.2043187e-05