Back to papers
AutoDDG: Automated Dataset Description Generation using Large Language Models
Summary: AutoDDG generates dataset descriptions for tabular data by combining data-driven content summarization with LLM-based semantic enrichment, targeting missing/inaccurate metadata in data lakes/open portals. Proposes a multi-faceted evaluation (retrieval, reference-based, reference-free, human) and shows improved dataset search/retrieval at scale.
(summarized by gpt-5-mini on Apr 11 2026)
- Paper ID
- 7452
- Venue
- SIGMOD
- Year
- 2026
- Pagerank
- 4.1945683e-05
- Overall Rank
- 10,142 | 29.45%
- DOI
-
10.1145/3786626
Incoming Non-self Citations Over Time
No non-self incoming citations found for this paper in this database.
Incoming Citations (Sorted by Pagerank)
Showing 0 of 0 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
Outgoing Citations (Sorted by Pagerank)
Showing 12 of 12 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 513 |
TURL: Table Understanding through Representation Learning |
2021 |
VLDB |
0.00021288342 |
| 939 |
Data Lake Management: Challenges and Opportunities |
2019 |
VLDB |
0.00015187344 |
| 1,187 |
JOSIE: Overlap Set Similarity Search for Finding Joinable Tables in Data Lakes |
2019 |
SIGMOD |
0.00013443639 |
| 1,751 |
Auctus: A Dataset Search Engine for Data Discovery and Augmentation |
2021 |
VLDB |
0.00010683295 |
| 2,517 |
Annotating Columns with Pre-trained Language Models |
2022 |
SIGMOD |
8.6092139e-05 |
| 2,888 |
Sato: Contextual Semantic Type Detection in Tables |
2020 |
VLDB |
7.9594996e-05 |
| 3,000 |
SANTOS: Relationship-based Semantic Table Union Search |
2023 |
SIGMOD |
7.7462128e-05 |
| 3,015 |
Chorus: Foundation Models for Unified Data Discovery and Exploration |
2024 |
VLDB |
7.7092391e-05 |
| 3,520 |
GitTables: A Large-Scale Corpus of Relational Tables |
2023 |
SIGMOD |
7.0131061e-05 |
| 3,824 |
Correlation Sketches for Approximate Join-Correlation Queries |
2021 |
SIGMOD |
6.7260705e-05 |
| 5,099 |
ArcheType: A Novel Framework for Open-Source Column Type Annotation using Large Language Models |
2024 |
VLDB |
5.6997784e-05 |
| 6,217 |
Pneuma: Leveraging LLMs for Tabular Data Representation and Retrieval in an End-to-End System |
2025 |
SIGMOD |
5.1534752e-05 |
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 6,800 |
DTT: An Example-Driven Tabular Transformer for Joinability by Leveraging Large Language Models |
2024 |
SIGMOD |
4.9231471e-05 |
| 5,529 |
Data-Driven Domain Discovery for Structured Datasets |
2020 |
VLDB |
5.4566641e-05 |
| 11,392 |
Automated Relational Data Explanation using External Semantic Knowledge |
2022 |
VLDB |
4.1945683e-05 |
| 13,098 |
Demonstrating CatDB: LLM-based Generation of Data-centric ML Pipelines |
2025 |
SIGMOD |
- |
| 10,628 |
CatDB: Data-catalog-guided, LLM-based Generation of Data-centric ML Pipelines |
2025 |
VLDB |
4.1945683e-05 |
| 9,476 |
Adda: Towards Efficient in-Database Feature Generation via LLM-based Agents |
2025 |
SIGMOD |
4.3341665e-05 |
| 10,316 |
LLM-AutoDP: Automatic Data Processing via LLM Agents for Model Fine-tuning |
2026 |
VLDB |
4.1945683e-05 |
| 8,155 |
Automated Data Visualization from Natural Language via Large Language Models: An Exploratory Study |
2024 |
SIGMOD |
4.5745248e-05 |
| 10,329 |
Revisiting Task-Oriented Dataset Search in the Era of Large Language Models: Challenges, Benchmark, and Solution |
2026 |
VLDB |
4.1945683e-05 |
| 10,973 |
Unstructured Data Fusion for Schema and Data Extraction |
2024 |
SIGMOD |
4.1945683e-05 |