Back to papers
Learning to be a Statistician: Learned Estimator for Number of Distinct Values
Summary: Proposes a supervised-learning NDV estimator, replacing heuristic sample methods with a data-driven model. Trains on synthetic data for workload-agnostic deployment as a UDF, outperforming existing estimators on nine real datasets; code available.
(summarized by gpt-5-nano on Feb 09 2026)
- Paper ID
- 12759
- Venue
- VLDB
- Year
- 2022
- Pagerank
- 4.6965039e-05
- Overall Rank
- 7,610 | 47.06%
- DOI
-
10.14778/3489496.3489508
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 9 of 9 citing papers.
Outgoing Citations (Sorted by Pagerank)
Showing 13 of 13 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 59 |
Sampling-Based Estimation of the Number of Distinct Values of an Attribute |
1995 |
VLDB |
0.00064501896 |
| 204 |
Learned Cardinalities: Estimating Correlated Joins with Deep Learning |
2019 |
CIDR |
0.00034784455 |
| 378 |
Towards Estimation Error Guarantees for Distinct Values |
2000 |
PODS |
0.0002497492 |
| 608 |
DeepDB: Learn from Data, not from Queries! |
2020 |
VLDB |
0.00019235898 |
| 1,254 |
Selectivity Estimation for Range Predicates using Lightweight Models |
2019 |
VLDB |
0.00013027411 |
| 1,574 |
Approximate Query Processing: No Silver Bullet |
2017 |
SIGMOD |
0.00011287495 |
| 1,683 |
Cardinality Estimation: An Experimental Survey |
2018 |
VLDB |
0.00010922679 |
| 1,703 |
Are We Ready For Learned Cardinality Estimation? |
2021 |
VLDB |
0.00010836769 |
| 2,762 |
FLAT: Fast, Lightweight and Accurate Method for Cardinality Estimation |
2021 |
VLDB |
8.1585394e-05 |
| 2,841 |
Selectivity Estimation in Extensible Databases - A Neural Network Approach |
1998 |
VLDB |
8.0287389e-05 |
| 2,969 |
Estimating Join Selectivities using Bandwidth-Optimized Kernel Density Models |
2017 |
VLDB |
7.7974762e-05 |
| 3,954 |
Efficiently Approximating Selectivity Functions using Low Overhead Regression Models |
2020 |
VLDB |
6.5926838e-05 |
| 6,244 |
Approximate Distinct Counts for Billions of Datasets |
2019 |
SIGMOD |
5.139669e-05 |
Semantically Similar Papers