A Bi-Level Bernoulli Scheme for Database Sampling
Summary: Bi-level Bernoulli sampling unites row- and page-level sampling for ISO-style queries, enabling a tunable speed–precision trade-off with SQL extensions and data-aware parameter optimization. A bang-bang policy governed by a page-heterogeneity index (PHI) guides parameter choice; pilot sampling or catalog statistics set PHI, with a heuristic achieving near-optimal accuracy on clustered or skewed data across 1,100 experiments. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Peter J. Haas
- 2. Christian König
Incoming Citations (Sorted by Pagerank)
Showing 9 of 9 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 2,779 | Hashed Samples: Selectivity Estimators For Set Similarity Selection Queries | 2008 | VLDB | 8.1320575e-05 |
| 3,013 | Cardinality Estimation Using Sample Views with Quality Assurance | 2007 | SIGMOD | 7.7137441e-05 |
| 4,435 | Sampling Dirty Data for Matching Attributes | 2010 | SIGMOD | 6.1918164e-05 |
| 5,140 | A Random Walk Approach to Sampling Hidden Databases | 2007 | SIGMOD | 5.668209e-05 |
| 6,286 | A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets | 2006 | VLDB | 5.1280225e-05 |
| 6,493 | Joins on Samples: A Theoretical Guide for Practitioners | 2020 | VLDB | 5.0424713e-05 |
| 9,384 | Sapprox: Enabling Efficient and Accurate Approximations on Sub-datasets with Distribution-aware Online Sampling | 2017 | VLDB | 4.3456129e-05 |
| 10,337 | Efficient Approximate Query Processing with Block Sampling | 2025 | CIDR | 4.1945683e-05 |
| 10,981 | Enabling Adaptive Sampling for Intra-Window Join: Simultaneously Optimizing Quantity and Quality | 2024 | SIGMOD | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 6 of 6 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 14 | Online Aggregation | 1997 | SIGMOD | 0.0010801504 |
| 18 | On Random Sampling over Joins | 1999 | SIGMOD | 0.00092385438 |
| 39 | Statistical Estimators for Relational Algebra Expressions | 1988 | PODS | 0.00074745564 |
| 46 | Simple Random Sampling from Relational Databases | 1986 | VLDB | 0.00070894702 |
| 211 | Join Synopses for Approximate Query Answering | 1999 | SIGMOD | 0.00033981214 |
| 553 | Bifocal Sampling for Skew-Resistant Join Size Estimation | 1996 | SIGMOD | 0.00020272061 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 6,493 | Joins on Samples: A Theoretical Guide for Practitioners | 2020 | VLDB | 5.0424713e-05 |
| 184 | New Sampling-Based Summary Statistics for Improving Approximate Query Answers | 1998 | SIGMOD | 0.00036625711 |
| 3,702 | Every Row Counts: Combining Sketches and Sampling for Accurate Group-By Result Estimates | 2019 | CIDR | 6.8295759e-05 |
| 2,254 | Two-Level Sampling for Join Size Estimation | 2017 | SIGMOD | 9.1897043e-05 |
| 18 | On Random Sampling over Joins | 1999 | SIGMOD | 0.00092385438 |
| 5,252 | Error-bounded Sampling for Analytics on Big Sparse Data | 2014 | VLDB | 5.6024389e-05 |
| 6,190 | Maintaining Bernoulli Samples over Evolving Multisets | 2007 | PODS | 5.1645517e-05 |
| 92 | Practical Selectivity Estimation through Adaptive Sampling | 1990 | SIGMOD | 0.00051315959 |
| 530 | Random Sampling for Histogram Construction: How much is enough? | 1998 | SIGMOD | 0.00020803682 |
| 46 | Simple Random Sampling from Relational Databases | 1986 | VLDB | 0.00070894702 |