Sampling-Based Estimation of the Number of Distinct Values of an Attribute
Summary: Proposes several sampling-based estimators for the number of distinct values (NDV) of an attribute and empirically compares them on highly-skewed real-world data. Introduces a hybrid estimator that blends a smoothed jackknife with Shlosser's method, maximizing precision for given sampling fraction and scalability. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Peter J. Haas
- 2. Jeffrey F. Naughton
- 3. S. Seshadri
- 4. Lynne Stokes
Incoming Citations (Sorted by Pagerank)
Showing 8 of 58 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 8,320 | Effective Change Detection Using Sampling | 2002 | VLDB | 4.5435639e-05 |
| 9,056 | A Data Quality Metric (DQM): How to Estimate the Number of Undetected Errors in Data Sets | 2017 | VLDB | 4.4039656e-05 |
| 10,227 | Sample-based Distinct Cardinality Estimation for Multiple Attributes in Multi-Dataset Queries | 2026 | VLDB | 4.1945683e-05 |
| 10,498 | PLM4NDV: Minimizing Data Access for Number of Distinct Values Estimation with Pre-trained Language Models | 2025 | SIGMOD | 4.1945683e-05 |
| 10,534 | AdaNDV: Adaptive Number of Distinct Value Estimation via Learning to Select and Fuse Estimators | 2025 | VLDB | 4.1945683e-05 |
| 11,194 | A Step Toward Deep Online Aggregation | 2023 | SIGMOD | 4.1945683e-05 |
| 12,060 | Statistics Collection in Oracle Spatial and Graph: Fast Histogram Construction for Complex Geometry Objects | 2013 | VLDB | 4.1945683e-05 |
| 12,531 | Join-Distinct Aggregate Estimation over Update Streams | 2005 | PODS | 4.1945683e-05 |
Outgoing Citations (Sorted by Pagerank)
Showing 4 of 4 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 1 | Access Path Selection in a Relational Database Management System | 1979 | SIGMOD | 0.0040449103 |
| 39 | Statistical Estimators for Relational Algebra Expressions | 1988 | PODS | 0.00074745564 |
| 134 | Processing Aggregate Relational Queries with Hard Time Constraints | 1989 | SIGMOD | 0.00042452811 |
| 139 | Predicate Migration: Optimizing Queries with Expensive Predicates | 1993 | SIGMOD | 0.00042299329 |
Previous
Page 1 / 1
Next