Database Paper Browser

Back to papers

Towards Estimation Error Guarantees for Distinct Values

Summary: Prove any sublinear-sample estimator for distinct counts must suffer large error on some natural distributions unless it reads a large fraction of the data. Give an estimator matching this lower bound and practical heuristics for typical distributions, validated empirically. (summarized by gpt-5-mini on Feb 09 2026)

Paper ID
1217
Venue
PODS
Year
2000
Pagerank
0.0002497492
Overall Rank
378 | 97.38%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 50 of 59 citing papers.

Rank Citing Paper Year Venue Pagerank
43 Models and Issues in Data Stream Systems 2002 PODS 0.00072723062
308 Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports 2001 VLDB 0.00028142852
449 Approximate Query Processing: Taming the TeraBytes! A Tutorial 2001 VLDB 0.00022846068
629 Preventing Bad Plans by Bounding the Impact of Cardinality Estimation Errors 2009 VLDB 0.00018942366
727 On Synopses for Distinct-Value Estimation Under Multiset Operations 2007 SIGMOD 0.00017508726
921 Democratizing Data Science through Interactive Curation of ML Pipelines 2019 SIGMOD 0.00015337438
1,574 Approximate Query Processing: No Silver Bullet 2017 SIGMOD 0.00011287495
1,683 Cardinality Estimation: An Experimental Survey 2018 VLDB 0.00010922679
1,758 Sampling-Based Query Re-Optimization 2016 SIGMOD 0.00010655546
1,797 Effective Use of Block-Level Sampling in Statistics Estimation 2004 SIGMOD 0.00010523169
1,967 Compressed Linear Algebra for Large-Scale Machine Learning 2016 VLDB 9.9131712e-05
2,184 A Sample-and-Clean Framework for Fast and Accurate Query Processing on Dirty Data 2014 SIGMOD 9.3429789e-05
2,282 Summarizing and Mining Inverse Distributions on Data Streams via Dynamic Inverse Sampling 2005 VLDB 9.1073603e-05
2,365 The Analytical Bootstrap: a New Method for Fast Error Estimation in Approximate Query Processing 2014 SIGMOD 8.9551432e-05
2,580 Sample + Seek: Approximating Aggregates with Distribution Precision Guarantee 2016 SIGMOD 8.5058814e-05
2,779 Hashed Samples: Selectivity Estimators For Set Similarity Selection Queries 2008 VLDB 8.1320575e-05
2,797 Query-Oriented Data Cleaning with Oracles 2015 SIGMOD 8.1108589e-05
2,837 Correlation Maps: A Compressed Access Method for Exploiting Soft Functional Dependencies 2009 VLDB 8.0414149e-05
2,878 Sampling Time-Based Sliding Windows in Bounded Space 2008 SIGMOD 7.9706235e-05
3,013 Cardinality Estimation Using Sample Views with Quality Assurance 2007 SIGMOD 7.7137441e-05
3,050 Comparing Data Streams Using Hamming Norms (How to Zero In) 2002 VLDB 7.6512619e-05
3,102 Processing Set Expressions over Continuous Update Streams 2003 SIGMOD 7.5586568e-05
3,167 Relational Confidence Bounds Are Easy With The Bootstrap* 2005 SIGMOD 7.4523397e-05
3,310 Optimal and Approximate Computation of Summary Statistics for Range Aggregates 2001 PODS 7.2408955e-05
3,558 Approximate Selection with Guarantees using Proxies 2020 VLDB 6.9765724e-05
3,702 Every Row Counts: Combining Sketches and Sampling for Accurate Group-By Result Estimates 2019 CIDR 6.8295759e-05
3,824 Correlation Sketches for Approximate Join-Correlation Queries 2021 SIGMOD 6.7260705e-05
3,867 CORADD: Correlation Aware Database Designer for Materialized Views and Indexes 2010 VLDB 6.683173e-05
4,623 Automated Generation of Materialized Views in Oracle 2020 VLDB 6.0411909e-05
4,833 MNC: Structure-Exploiting Sparsity Estimation for Matrix Expressions 2019 SIGMOD 5.8916346e-05
5,014 Dynamically Optimizing Queries over Large Scale Data Platforms 2014 SIGMOD 5.7586174e-05
5,340 Efficiently Approximating Query Optimizer Plan Diagrams 2008 VLDB 5.5623066e-05
5,868 ABS: a System for Scalable Approximate Queries with Accuracy Guarantees 2014 SIGMOD 5.2959352e-05
5,905 Exploiting Ordered Dictionaries to Efficiently Construct Histograms with Q-Error Guarantees in SAP HANA 2014 SIGMOD 5.2788785e-05
6,157 Compression Aware Physical Database Design 2011 VLDB 5.1801143e-05
6,191 Automatic Optimization of Matrix Implementations for Distributed Machine Learning and Linear Algebra 2021 SIGMOD 5.1642282e-05
6,278 Uncertainty Aware Query Execution Time Prediction 2014 VLDB 5.1309442e-05
6,368 Pre-training Summarization Models of Structured Datasets for Cardinality Estimation 2022 VLDB 5.0937722e-05
6,374 Optimization of Conjunctive Predicates for Main Memory Column Stores 2016 VLDB 5.0927058e-05
6,763 Robustness Metrics for Relational Query Execution Plans 2018 VLDB 4.9338479e-05
6,941 Estimating the Impact of Unknown Unknowns on Aggregate Query Results 2016 SIGMOD 4.8924e-05
7,415 Efficient and Scalable Statistics Gathering for Large Databases in Oracle 11g 2008 SIGMOD 4.7355557e-05
7,610 Learning to be a Statistician: Learned Estimator for Number of Distinct Values 2022 VLDB 4.6965039e-05
7,667 Fast Detection of Denial Constraint Violations 2022 VLDB 4.683767e-05
8,158 MONSOON: Multi-Step Optimization and Execution of Queries with Partially Obscured Predicates 2020 SIGMOD 4.5730772e-05
8,350 alpha to omega: The Greek Alphabet of Sampling 2020 CIDR 4.5404832e-05
8,834 ByteCard: Enhancing ByteDance’s Data Warehouse with Learned Cardinality Estimation 2024 SIGMOD 4.4394021e-05
8,835 Learning-based Property Estimation with Polynomials 2024 SIGMOD 4.4394021e-05
8,893 Histograms Reloaded: The Merits of Bucket Diversity 2010 SIGMOD 4.4275272e-05
9,227 Panakos: Chasing the Tails for Multidimensional Data Streams 2023 VLDB 4.3692732e-05
Previous Page 1 / 2 Next

Outgoing Citations (Sorted by Pagerank)

Showing 5 of 5 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers