Efficient Estimation of Inclusion Coefficient using HyperLogLog Sketches
Summary: Introduces BML, a HyperLogLog-based estimator for the inclusion coefficient (fraction of A’s values contained in B), delivering significantly lower error than Bottom-k baselines on synthetic and real data. Also demonstrates constant-memory incremental maintenance of HyperLogLog sketches with deletions, with empirical validation on TPC-H, TPC-DS, and real-world DBs. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Azade Nazi
- 2. Bolin Ding
- 3. Vivek Narasayya
- 4. Surajit Chaudhuri
Incoming Citations (Sorted by Pagerank)
Showing 4 of 4 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 3,702 | Every Row Counts: Combining Sketches and Sampling for Accurate Group-By Result Estimates | 2019 | CIDR | 6.8295759e-05 |
| 5,200 | SetSketch: Filling the Gap between MinHash and HyperLogLog | 2021 | VLDB | 5.6337581e-05 |
| 6,261 | The Cosmos Big Data Platform at Microsoft: Over a Decade of Progress and a Decade to Look Forward | 2021 | VLDB | 5.1350714e-05 |
| 7,709 | UltraLogLog: A Practical and More Space-Efficient Alternative to HyperLogLog for Approximate Distinct Counting | 2024 | VLDB | 4.6720658e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 7 of 7 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 22 | SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets | 2008 | VLDB | 0.0008456613 |
| 727 | On Synopses for Distinct-Value Estimation Under Multiset Operations | 2007 | SIGMOD | 0.00017508726 |
| 1,625 | Data Profiling with Metanome | 2015 | VLDB | 0.00011094926 |
| 1,664 | On Multi-Column Foreign Key Discovery | 2010 | VLDB | 0.00010976887 |
| 3,708 | Is Min-Wise Hashing Optimal for Summarizing Set Intersection? | 2014 | PODS | 6.8247903e-05 |
| 4,784 | Divide & Conquer-based Inclusion Dependency Discovery | 2015 | VLDB | 5.9240851e-05 |
| 5,486 | Fast Foreign-Key Detection in Microsoft SQL Server PowerPivot for Excel | 2014 | VLDB | 5.4811603e-05 |
Previous
Page 1 / 1
Next