Database Paper Browser

Back to papers

Joins on Samples: A Theoretical Guide for Practitioners

Summary: Revisits sample-based joins for AQP, challenging the futility view and bounding join estimation by output size and variance. Proposes a sampling scheme (Bernoulli, universe) with optimal parameters plus a distributed variant; validated on SQL/AQP engines. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
12255
Venue
VLDB
Year
2020
Pagerank
5.0424713e-05
Overall Rank
6,493 | 54.84%
DOI
10.14778/3372721.3372726

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 8 of 8 citing papers.

Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 37 of 37 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
14 Online Aggregation 1997 SIGMOD 0.0010801504
18 On Random Sampling over Joins 1999 SIGMOD 0.00092385438
204 Learned Cardinalities: Estimating Correlated Joins with Deep Learning 2019 CIDR 0.00034784455
211 Join Synopses for Approximate Query Answering 1999 SIGMOD 0.00033981214
217 Ripple Joins for Online Aggregation 1999 SIGMOD 0.00033536712
943 Wander Join: Online Aggregation via Random Walks 2016 SIGMOD 0.00015145883
967 Aqua: A Fast Decision Support System Using Approximate Query Answers 1999 VLDB 0.00014959939
1,064 Processing Complex Aggregate Queries over Data Streams 2002 SIGMOD 0.00014356481
1,105 Cardinality Estimation Done Right: Index-Based Join Sampling 2017 CIDR 0.00013990395
1,152 Blink and It's Done: Interactive Queries on Very Large Data 2012 VLDB 0.00013645792
1,193 Join Size Estimation Subject to Filter Conditions 2015 VLDB 0.00013414989
1,204 VerdictDB: Universalizing Approximate Query Processing 2018 SIGMOD 0.00013319541
1,260 Dynamic Sample Selection for Approximate Query Processing 2003 SIGMOD 0.00012993347
1,323 Quickr: Lazily Approximating Complex AdHoc Queries in BigData Clusters 2016 SIGMOD 0.00012601997
1,335 ICICLES: Self-tuning Samples for Approximate Query Answering 2000 VLDB 0.00012502131
1,369 Random Sampling over Joins Revisited 2018 SIGMOD 0.00012339777
1,464 Online Aggregation for Large MapReduce Jobs 2011 VLDB 0.00011865546
1,737 QuickSel: Quick Selectivity Learning with Mixture Models 2020 SIGMOD 0.00010720294
1,758 Sampling-Based Query Re-Optimization 2016 SIGMOD 0.00010655546
1,797 Effective Use of Block-Level Sampling in Statistics Estimation 2004 SIGMOD 0.00010523169
1,874 Knowing When You’re Wrong: Building Fast and Reliable Approximate Query Processing Systems 2014 SIGMOD 0.00010244443
2,202 A Scalable Hash Ripple Join Algorithm 2002 SIGMOD 9.2987417e-05
2,251 Vizdom: Interactive Analytics through Pen and Touch 2015 VLDB 9.1986441e-05
2,254 Two-Level Sampling for Join Size Estimation 2017 SIGMOD 9.1897043e-05
2,365 The Analytical Bootstrap: a New Method for Fast Error Estimation in Approximate Query Processing 2014 SIGMOD 8.9551432e-05
2,588 Database Learning: Toward a Database that Becomes Smarter Every Time 2017 SIGMOD 8.4909562e-05
2,779 Hashed Samples: Selectivity Estimators For Set Similarity Selection Queries 2008 VLDB 8.1320575e-05
3,309 Distributed Lock Management with RDMA: Decentralization without Starvation 2018 SIGMOD 7.2419042e-05
3,333 SnappyData: A Unified Cluster for Streaming, Transactions, and Interactive Analytics 2017 CIDR 7.2093479e-05
3,594 Continuous Sampling for Online Aggregation Over Multiple Queries 2010 SIGMOD 6.9381343e-05
3,708 Is Min-Wise Hashing Optimal for Summarizing Set Intersection? 2014 PODS 6.8247903e-05
3,835 I've Seen "Enough": Incrementally Improving Visualizations to Support Rapid Decision Making 2017 VLDB 6.7163364e-05
3,842 Turbo-Charging Estimate Convergence in DBO 2009 VLDB 6.7102374e-05
4,100 A Bi-Level Bernoulli Scheme for Database Sampling 2004 SIGMOD 6.4531387e-05
4,245 A Disk-Based Join With Probabilistic Guarantees* 2005 SIGMOD 6.3272687e-05
5,581 CliffGuard: A Principled Framework for Finding Robust Database Designs 2015 SIGMOD 5.424205e-05
6,411 Approximate Query Engines: Commercial Challenges and Research Opportunities 2017 SIGMOD 5.0752468e-05
Previous Page 1 / 1 Next

Semantically Similar Papers