Back to papers
PLM4NDV: Minimizing Data Access for Number of Distinct Values Estimation with Pre-trained Language Models
Summary: Leverages semantic schema via pre-trained language models to estimate the number of distinct values (NDV) with reduced data access. PLM4NDV fuses target-column and table semantics to lower access costs, can operate with no data access, and outperforms baselines on large real-world datasets.
(summarized by gpt-5-nano on Feb 09 2026)
- Paper ID
- 7255
- Venue
- SIGMOD
- Year
- 2025
- Pagerank
- 4.1945683e-05
- Overall Rank
- 10,498 | 26.97%
- DOI
-
10.1145/3725336
Incoming Non-self Citations Over Time
No non-self incoming citations found for this paper in this database.
Incoming Citations (Sorted by Pagerank)
Showing 0 of 0 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
Outgoing Citations (Sorted by Pagerank)
Showing 26 of 26 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 1 |
Access Path Selection in a Relational Database Management System |
1979 |
SIGMOD |
0.0040449103 |
| 59 |
Sampling-Based Estimation of the Number of Distinct Values of an Attribute |
1995 |
VLDB |
0.00064501896 |
| 221 |
Deep Entity Matching with Pre-Trained Language Models |
2021 |
VLDB |
0.00033121824 |
| 369 |
Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation |
2024 |
VLDB |
0.0002547515 |
| 378 |
Towards Estimation Error Guarantees for Distinct Values |
2000 |
PODS |
0.0002497492 |
| 513 |
TURL: Table Understanding through Representation Learning |
2021 |
VLDB |
0.00021288342 |
| 530 |
Random Sampling for Histogram Construction: How much is enough? |
1998 |
SIGMOD |
0.00020803682 |
| 629 |
Preventing Bad Plans by Bounding the Impact of Cardinality Estimation Errors |
2009 |
VLDB |
0.00018942366 |
| 727 |
On Synopses for Distinct-Value Estimation Under Multiset Operations |
2007 |
SIGMOD |
0.00017508726 |
| 1,683 |
Cardinality Estimation: An Experimental Survey |
2018 |
VLDB |
0.00010922679 |
| 1,797 |
Effective Use of Block-Level Sampling in Statistics Estimation |
2004 |
SIGMOD |
0.00010523169 |
| 1,956 |
D-Bot: Database Diagnosis System using Large Language Models |
2024 |
VLDB |
9.960627e-05 |
| 2,517 |
Annotating Columns with Pre-trained Language Models |
2022 |
SIGMOD |
8.6092139e-05 |
| 2,945 |
Few-shot Text-to-SQL Translation using Structure and Content Prompt Learning |
2023 |
SIGMOD |
7.8377395e-05 |
| 3,520 |
GitTables: A Large-Scale Corpus of Relational Tables |
2023 |
SIGMOD |
7.0131061e-05 |
| 5,023 |
GenRewrite: Query Rewriting via Large Language Models |
2026 |
SIGMOD |
5.75363e-05 |
| 5,337 |
Learned Index Benefits: Machine Learning Based Index Performance Estimation |
2022 |
VLDB |
5.5635208e-05 |
| 5,401 |
ALECE: An Attention-based Learned Cardinality Estimator for SPJ Queries on Dynamic Workloads |
2024 |
VLDB |
5.5285035e-05 |
| 7,336 |
Refactoring Index Tuning Process with Benefit Estimation |
2024 |
VLDB |
4.7599411e-05 |
| 7,610 |
Learning to be a Statistician: Learned Estimator for Number of Distinct Values |
2022 |
VLDB |
4.6965039e-05 |
| 7,709 |
UltraLogLog: A Practical and More Space-Efficient Alternative to HyperLogLog for Approximate Distinct Counting |
2024 |
VLDB |
4.6720658e-05 |
| 8,393 |
LAQy: Efficient and Reusable Query Approximations via Lazy Sampling |
2023 |
SIGMOD |
4.5280102e-05 |
| 8,683 |
FormaT5: Abstention and Examples for Conditional Table Formatting with Natural Language |
2024 |
VLDB |
4.4686885e-05 |
| 8,834 |
ByteCard: Enhancing ByteDance’s Data Warehouse with Learned Cardinality Estimation |
2024 |
SIGMOD |
4.4394021e-05 |
| 8,835 |
Learning-based Property Estimation with Polynomials |
2024 |
SIGMOD |
4.4394021e-05 |
| 10,534 |
AdaNDV: Adaptive Number of Distinct Value Estimation via Learning to Select and Fuse Estimators |
2025 |
VLDB |
4.1945683e-05 |
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 2,517 |
Annotating Columns with Pre-trained Language Models |
2022 |
SIGMOD |
8.6092139e-05 |
| 10,012 |
A Fast, Mergeable, and LDP Compatible Sketch for Counting the Number of Distinct Values in Fully Dynamic Tables |
2026 |
SIGMOD |
4.1945683e-05 |
| 6,368 |
Pre-training Summarization Models of Structured Datasets for Cardinality Estimation |
2022 |
VLDB |
5.0937722e-05 |
| 10,973 |
Unstructured Data Fusion for Schema and Data Extraction |
2024 |
SIGMOD |
4.1945683e-05 |
| 2,364 |
Deep Learning Models for Selectivity Estimation of Multi-Attribute Queries |
2020 |
SIGMOD |
8.9554751e-05 |
| 7,186 |
LPLM: A Neural Language Model for Cardinality Estimation of LIKE-Queries |
2024 |
SIGMOD |
4.8063731e-05 |
| 8,835 |
Learning-based Property Estimation with Polynomials |
2024 |
SIGMOD |
4.4394021e-05 |
| 3,335 |
DeepJoin: Joinable Table Discovery with Pre-trained Language Models |
2023 |
VLDB |
7.2065006e-05 |
| 7,610 |
Learning to be a Statistician: Learned Estimator for Number of Distinct Values |
2022 |
VLDB |
4.6965039e-05 |
| 10,534 |
AdaNDV: Adaptive Number of Distinct Value Estimation via Learning to Select and Fuse Estimators |
2025 |
VLDB |
4.1945683e-05 |