Database Paper Browser

Back to papers

Every Row Counts: Combining Sketches and Sampling for Accurate Group-By Result Estimates

Summary: Hybrid estimator combining per-column sketches that capture full-frequency information with small random samples to correct inter-column correlation bias, enabling accurate group-by distinct-count estimates for arbitrary attribute combinations. Achieves near-perfect per-column accuracy, high multi-column accuracy, low integration overhead, and negligible estimation time via an efficient sample-scan algorithm. (summarized by gpt-5-mini on Feb 09 2026)

Paper ID
329
Venue
CIDR
Year
2019
Pagerank
6.8295759e-05
Overall Rank
3,702 | 74.25%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 16 of 16 citing papers.

Rank Citing Paper Year Venue Pagerank
735 Umbra: A Disk-Based System with In-Memory Performance 2020 CIDR 0.00017452467
2,275 Adopting Worst-Case Optimal Joins in Relational Database Systems 2020 VLDB 9.1262202e-05
4,833 MNC: Structure-Exploiting Sparsity Estimation for Matrix Expressions 2019 SIGMOD 5.8916346e-05
5,200 SetSketch: Filling the Gap between MinHash and HyperLogLog 2021 VLDB 5.6337581e-05
6,604 MotherDuck: DuckDB in the cloud and in the client 2024 CIDR 4.9971118e-05
6,969 LpBound: Pessimistic Cardinality Estimation using ℓp-Norms of Degree Sequences 2025 SIGMOD 4.8799937e-05
7,033 DuckPGQ: Bringing SQL/PGQ to DuckDB 2023 VLDB 4.8551607e-05
7,358 Weighted Distinct Sampling: Cardinality Estimation for SPJ Queries 2021 SIGMOD 4.7529363e-05
7,667 Fast Detection of Denial Constraint Violations 2022 VLDB 4.683767e-05
7,709 UltraLogLog: A Practical and More Space-Efficient Alternative to HyperLogLog for Approximate Distinct Counting 2024 VLDB 4.6720658e-05
8,275 Adaptive Factorization Using Linear-Chained Hash Tables 2025 CIDR 4.5439841e-05
8,680 A Practical Approach to Groupjoin and Nested Aggregates 2021 VLDB 4.4694927e-05
9,227 Panakos: Chasing the Tails for Multidimensional Data Streams 2023 VLDB 4.3692732e-05
10,012 A Fast, Mergeable, and LDP Compatible Sketch for Counting the Number of Distinct Values in Fully Dynamic Tables 2026 SIGMOD 4.1945683e-05
10,227 Sample-based Distinct Cardinality Estimation for Multiple Attributes in Multi-Dataset Queries 2026 VLDB 4.1945683e-05
11,254 Asymptotically Better Query Optimization Using Indexed Algebra 2023 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 14 of 14 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers