A Disk-Based Join With Probabilistic Guarantees*
Summary: A disk-based join that maintains an online statistical estimator with probabilistic confidence bounds for the aggregate result. Users can monitor progress, stop early when accuracy suffices, or finish with near-memory-free time, unlike prior online joins lacking guarantees or requiring large memory. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Christopher Jermaine
- 2. Alin Dobra
- 3. Subramanian Arumugam
- 4. Shantanu Joshi
- 5. Abhijit Pol
Incoming Citations (Sorted by Pagerank)
Showing 7 of 7 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 1,425 | Scalable Approximate Query Processing With The DBO Engine | 2007 | SIGMOD | 0.00012051353 |
| 1,770 | ParaTimer: A Progress Indicator for MapReduce DAGs | 2010 | SIGMOD | 0.00010618229 |
| 3,594 | Continuous Sampling for Online Aggregation Over Multiple Queries | 2010 | SIGMOD | 6.9381343e-05 |
| 4,030 | Revisiting Reuse for Approximate Query Processing | 2017 | VLDB | 6.5129665e-05 |
| 4,093 | Distributed Online Aggregations | 2009 | VLDB | 6.4558147e-05 |
| 6,493 | Joins on Samples: A Theoretical Guide for Practitioners | 2020 | VLDB | 5.0424713e-05 |
| 9,317 | Are Joins over LSM-trees Ready? Take RocksDB as an Example | 2025 | VLDB | 4.3556432e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 11 of 11 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 14 | Online Aggregation | 1997 | SIGMOD | 0.0010801504 |
| 18 | On Random Sampling over Joins | 1999 | SIGMOD | 0.00092385438 |
| 39 | Statistical Estimators for Relational Algebra Expressions | 1988 | PODS | 0.00074745564 |
| 134 | Processing Aggregate Relational Queries with Hard Time Constraints | 1989 | SIGMOD | 0.00042452811 |
| 217 | Ripple Joins for Online Aggregation | 1999 | SIGMOD | 0.00033536712 |
| 357 | Random Sampling from B+ trees | 1989 | VLDB | 0.00026020098 |
| 553 | Bifocal Sampling for Skew-Resistant Join Size Estimation | 1996 | SIGMOD | 0.00020272061 |
| 783 | Random Sampling from Hash Files | 1990 | SIGMOD | 0.00016704834 |
| 2,202 | A Scalable Hash Ripple Join Algorithm | 2002 | SIGMOD | 9.2987417e-05 |
| 3,204 | Progressive Merge Join: A Generic and Non-Blocking Sort-Based Join Algorithm | 2002 | VLDB | 7.3889212e-05 |
| 5,511 | On Producing Join Results Early | 2003 | PODS | 5.4699346e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 12,531 | Join-Distinct Aggregate Estimation over Update Streams | 2005 | PODS | 4.1945683e-05 |
| 7,623 | Optimizing Probabilistic Query Processing on Continuous Uncertain Data | 2011 | VLDB | 4.6933659e-05 |
| 9,591 | Constructing Join Histograms from Histograms with q-error Guarantees | 2016 | SIGMOD | 4.3204659e-05 |
| 5,104 | Guaranteeing the O~(AGM/OUT) Runtime for Uniform Sampling and Size Estimation over Joins | 2023 | PODS | 5.6946113e-05 |
| 8,205 | PR-Join: A Non-Blocking Join Achieving Higher Early Result Rate with Statistical Guarantees | 2010 | SIGMOD | 4.5593375e-05 |
| 8,959 | Reservoir Sampling over Joins | 2024 | SIGMOD | 4.4206222e-05 |
| 211 | Join Synopses for Approximate Query Answering | 1999 | SIGMOD | 0.00033981214 |
| 217 | Ripple Joins for Online Aggregation | 1999 | SIGMOD | 0.00033536712 |
| 12,567 | Online Estimation For Subset-Based SQL Queries | 2005 | VLDB | 4.1945683e-05 |
| 12,191 | Efficient Rank Join with Aggregation Constraints | 2011 | VLDB | 4.1945683e-05 |