Database Paper Browser

Back to papers

PLM4NDV: Minimizing Data Access for Number of Distinct Values Estimation with Pre-trained Language Models

Summary: Leverages semantic schema via pre-trained language models to estimate the number of distinct values (NDV) with reduced data access. PLM4NDV fuses target-column and table semantics to lower access costs, can operate with no data access, and outperforms baselines on large real-world datasets. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
7255
Venue
SIGMOD
Year
2025
Pagerank
4.1945683e-05
Overall Rank
10,498 | 26.97%
DOI
10.1145/3725336

Incoming Non-self Citations Over Time

No non-self incoming citations found for this paper in this database.

Authors

Incoming Citations (Sorted by Pagerank)

Showing 0 of 0 citing papers.

Rank Citing Paper Year Venue Pagerank
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 26 of 26 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
1 Access Path Selection in a Relational Database Management System 1979 SIGMOD 0.0040449103
59 Sampling-Based Estimation of the Number of Distinct Values of an Attribute 1995 VLDB 0.00064501896
221 Deep Entity Matching with Pre-Trained Language Models 2021 VLDB 0.00033121824
369 Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation 2024 VLDB 0.0002547515
378 Towards Estimation Error Guarantees for Distinct Values 2000 PODS 0.0002497492
513 TURL: Table Understanding through Representation Learning 2021 VLDB 0.00021288342
530 Random Sampling for Histogram Construction: How much is enough? 1998 SIGMOD 0.00020803682
629 Preventing Bad Plans by Bounding the Impact of Cardinality Estimation Errors 2009 VLDB 0.00018942366
727 On Synopses for Distinct-Value Estimation Under Multiset Operations 2007 SIGMOD 0.00017508726
1,683 Cardinality Estimation: An Experimental Survey 2018 VLDB 0.00010922679
1,797 Effective Use of Block-Level Sampling in Statistics Estimation 2004 SIGMOD 0.00010523169
1,956 D-Bot: Database Diagnosis System using Large Language Models 2024 VLDB 9.960627e-05
2,517 Annotating Columns with Pre-trained Language Models 2022 SIGMOD 8.6092139e-05
2,945 Few-shot Text-to-SQL Translation using Structure and Content Prompt Learning 2023 SIGMOD 7.8377395e-05
3,520 GitTables: A Large-Scale Corpus of Relational Tables 2023 SIGMOD 7.0131061e-05
5,023 GenRewrite: Query Rewriting via Large Language Models 2026 SIGMOD 5.75363e-05
5,337 Learned Index Benefits: Machine Learning Based Index Performance Estimation 2022 VLDB 5.5635208e-05
5,401 ALECE: An Attention-based Learned Cardinality Estimator for SPJ Queries on Dynamic Workloads 2024 VLDB 5.5285035e-05
7,336 Refactoring Index Tuning Process with Benefit Estimation 2024 VLDB 4.7599411e-05
7,610 Learning to be a Statistician: Learned Estimator for Number of Distinct Values 2022 VLDB 4.6965039e-05
7,709 UltraLogLog: A Practical and More Space-Efficient Alternative to HyperLogLog for Approximate Distinct Counting 2024 VLDB 4.6720658e-05
8,393 LAQy: Efficient and Reusable Query Approximations via Lazy Sampling 2023 SIGMOD 4.5280102e-05
8,683 FormaT5: Abstention and Examples for Conditional Table Formatting with Natural Language 2024 VLDB 4.4686885e-05
8,834 ByteCard: Enhancing ByteDance’s Data Warehouse with Learned Cardinality Estimation 2024 SIGMOD 4.4394021e-05
8,835 Learning-based Property Estimation with Polynomials 2024 SIGMOD 4.4394021e-05
10,534 AdaNDV: Adaptive Number of Distinct Value Estimation via Learning to Select and Fuse Estimators 2025 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Semantically Similar Papers