Back to papers
Every Row Counts: Combining Sketches and Sampling for Accurate Group-By Result Estimates
Summary: Hybrid estimator combining per-column sketches that capture full-frequency information with small random samples to correct inter-column correlation bias, enabling accurate group-by distinct-count estimates for arbitrary attribute combinations. Achieves near-perfect per-column accuracy, high multi-column accuracy, low integration overhead, and negligible estimation time via an efficient sample-scan algorithm.
(summarized by gpt-5-mini on Feb 09 2026)
- Paper ID
- 329
- Venue
- CIDR
- Year
- 2019
- Pagerank
- 6.8295759e-05
- Overall Rank
- 3,702 | 74.25%
- DOI
-
-
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 16 of 16 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
| 735 |
Umbra: A Disk-Based System with In-Memory Performance |
2020 |
CIDR |
0.00017452467 |
| 2,275 |
Adopting Worst-Case Optimal Joins in Relational Database Systems |
2020 |
VLDB |
9.1262202e-05 |
| 4,833 |
MNC: Structure-Exploiting Sparsity Estimation for Matrix Expressions |
2019 |
SIGMOD |
5.8916346e-05 |
| 5,200 |
SetSketch: Filling the Gap between MinHash and HyperLogLog |
2021 |
VLDB |
5.6337581e-05 |
| 6,604 |
MotherDuck: DuckDB in the cloud and in the client |
2024 |
CIDR |
4.9971118e-05 |
| 6,969 |
LpBound: Pessimistic Cardinality Estimation using ℓp-Norms of Degree Sequences |
2025 |
SIGMOD |
4.8799937e-05 |
| 7,033 |
DuckPGQ: Bringing SQL/PGQ to DuckDB |
2023 |
VLDB |
4.8551607e-05 |
| 7,358 |
Weighted Distinct Sampling: Cardinality Estimation for SPJ Queries |
2021 |
SIGMOD |
4.7529363e-05 |
| 7,667 |
Fast Detection of Denial Constraint Violations |
2022 |
VLDB |
4.683767e-05 |
| 7,709 |
UltraLogLog: A Practical and More Space-Efficient Alternative to HyperLogLog for Approximate Distinct Counting |
2024 |
VLDB |
4.6720658e-05 |
| 8,275 |
Adaptive Factorization Using Linear-Chained Hash Tables |
2025 |
CIDR |
4.5439841e-05 |
| 8,680 |
A Practical Approach to Groupjoin and Nested Aggregates |
2021 |
VLDB |
4.4694927e-05 |
| 9,227 |
Panakos: Chasing the Tails for Multidimensional Data Streams |
2023 |
VLDB |
4.3692732e-05 |
| 10,012 |
A Fast, Mergeable, and LDP Compatible Sketch for Counting the Number of Distinct Values in Fully Dynamic Tables |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,227 |
Sample-based Distinct Cardinality Estimation for Multiple Attributes in Multi-Dataset Queries |
2026 |
VLDB |
4.1945683e-05 |
| 11,254 |
Asymptotically Better Query Optimization Using Indexed Algebra |
2023 |
VLDB |
4.1945683e-05 |
Outgoing Citations (Sorted by Pagerank)
Showing 14 of 14 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 59 |
Sampling-Based Estimation of the Number of Distinct Values of an Attribute |
1995 |
VLDB |
0.00064501896 |
| 71 |
How Good Are Query Optimizers, Really? |
2016 |
VLDB |
0.00059038975 |
| 224 |
CORDS: Automatic Discovery of Correlations and Soft Functional Dependencies |
2004 |
SIGMOD |
0.00032746205 |
| 308 |
Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports |
2001 |
VLDB |
0.00028142852 |
| 378 |
Towards Estimation Error Guarantees for Distinct Values |
2000 |
PODS |
0.0002497492 |
| 1,047 |
Functional Dependency Discovery: An Experimental Evaluation of Seven Algorithms |
2015 |
VLDB |
0.00014459715 |
| 1,105 |
Cardinality Estimation Done Right: Index-Based Join Sampling |
2017 |
CIDR |
0.00013990395 |
| 1,683 |
Cardinality Estimation: An Experimental Survey |
2018 |
VLDB |
0.00010922679 |
| 1,981 |
Improved Selectivity Estimation by Combining Knowledge from Sampling and Synopses |
2018 |
VLDB |
9.8687545e-05 |
| 2,580 |
Sample + Seek: Approximating Aggregates with Distribution Precision Guarantee |
2016 |
SIGMOD |
8.5058814e-05 |
| 3,013 |
Cardinality Estimation Using Sample Views with Quality Assurance |
2007 |
SIGMOD |
7.7137441e-05 |
| 4,571 |
Adaptive Statistics in Oracle 12c |
2017 |
VLDB |
6.0773174e-05 |
| 4,831 |
DigitHist: a Histogram-Based Data Summary with Tight Error Bounds |
2017 |
VLDB |
5.8924198e-05 |
| 5,361 |
Efficient Estimation of Inclusion Coefficient using HyperLogLog Sketches |
2018 |
VLDB |
5.547935e-05 |
Semantically Similar Papers