Joins on Samples: A Theoretical Guide for Practitioners
Summary: Revisits sample-based joins for AQP, challenging the futility view and bounding join estimation by output size and variance. Proposes a sampling scheme (Bernoulli, universe) with optimal parameters plus a distributed variant; validated on SQL/AQP engines. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Dawei Huang
- 2. Dong Young Yoon
- 3. Seth Pettie
- 4. Barzan Mozafari
Incoming Citations (Sorted by Pagerank)
Showing 8 of 8 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 3,779 | Instance-Optimized Data Layouts for Cloud Analytics Workloads | 2021 | SIGMOD | 6.7747205e-05 |
| 3,824 | Correlation Sketches for Approximate Join-Correlation Queries | 2021 | SIGMOD | 6.7260705e-05 |
| 3,924 | A Unified Deep Model of Learning from both Data and Queries for Cardinality Estimation | 2021 | SIGMOD | 6.6271553e-05 |
| 5,024 | Towards Distribution-aware Query Answering in Data Markets | 2022 | VLDB | 5.7535043e-05 |
| 8,643 | One Size Does Not Fit All: A Bandit-Based Sampler Combination Framework with Theoretical Guarantees | 2022 | SIGMOD | 4.4777916e-05 |
| 9,118 | Towards Observability for Production Machine Learning Pipelines | 2022 | VLDB | 4.3928288e-05 |
| 10,497 | PilotDB: Database-Agnostic Online Approximate Query Processing with A Priori Error Guarantees | 2025 | SIGMOD | 4.1945683e-05 |
| 10,981 | Enabling Adaptive Sampling for Intra-Window Join: Simultaneously Optimizing Quantity and Quality | 2024 | SIGMOD | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 37 of 37 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 92 | Practical Selectivity Estimation through Adaptive Sampling | 1990 | SIGMOD | 0.00051315959 |
| 549 | Tracking Join and Self-Join Sizes in Limited Storage | 1999 | PODS | 0.00020376603 |
| 1,874 | Knowing When You’re Wrong: Building Fast and Reliable Approximate Query Processing Systems | 2014 | SIGMOD | 0.00010244443 |
| 1,255 | Fixed-Precision Estimation of Join Selectivity | 1993 | PODS | 0.00013024064 |
| 2,254 | Two-Level Sampling for Join Size Estimation | 2017 | SIGMOD | 9.1897043e-05 |
| 2,580 | Sample + Seek: Approximating Aggregates with Distribution Precision Guarantee | 2016 | SIGMOD | 8.5058814e-05 |
| 1,369 | Random Sampling over Joins Revisited | 2018 | SIGMOD | 0.00012339777 |
| 8,959 | Reservoir Sampling over Joins | 2024 | SIGMOD | 4.4206222e-05 |
| 6,740 | Combining Aggregation and Sampling (Nearly) Optimally for Approximate Query Processing | 2021 | SIGMOD | 4.944395e-05 |
| 18 | On Random Sampling over Joins | 1999 | SIGMOD | 0.00092385438 |