| 517 |
Can Foundation Models Wrangle Your Data? |
2023 |
VLDB |
0.00021169035 |
| 2,349 |
RPT: Relational Pre-trained Transformer Is Almost All You Need towards Democratizing Data Preparation |
2021 |
VLDB |
8.9876423e-05 |
| 2,517 |
Annotating Columns with Pre-trained Language Models |
2022 |
SIGMOD |
8.6092139e-05 |
| 2,587 |
Table-GPT: Table Fine-tuned GPT for Diverse Table Tasks |
2024 |
SIGMOD |
8.4924618e-05 |
| 2,836 |
Semantics-aware Dataset Discovery from Data Lakes with Contextualized Column-based Representation Learning |
2023 |
VLDB |
8.0443826e-05 |
| 3,000 |
SANTOS: Relationship-based Semantic Table Union Search |
2023 |
SIGMOD |
7.7462128e-05 |
| 3,015 |
Chorus: Foundation Models for Unified Data Discovery and Exploration |
2024 |
VLDB |
7.7092391e-05 |
| 3,335 |
DeepJoin: Joinable Table Discovery with Pre-trained Language Models |
2023 |
VLDB |
7.2065006e-05 |
| 3,520 |
GitTables: A Large-Scale Corpus of Relational Tables |
2023 |
SIGMOD |
7.0131061e-05 |
| 3,995 |
How Large Language Models Will Disrupt Data Management |
2023 |
VLDB |
6.5513237e-05 |
| 4,212 |
Unicorn: A Unified Multi-tasking Model for Supporting Matching Tasks in Data Integration |
2023 |
SIGMOD |
6.3555142e-05 |
| 4,661 |
PreQR: Pre-training Representation for SQL Understanding |
2022 |
SIGMOD |
6.0137947e-05 |
| 4,859 |
Integrating Data Lake Tables |
2023 |
VLDB |
5.8732433e-05 |
| 4,967 |
Leva: Boosting Machine Learning Performance with Relational Embedding Data Augmentation |
2022 |
SIGMOD |
5.7956612e-05 |
| 5,099 |
ArcheType: A Novel Framework for Open-Source Column Type Annotation using Large Language Models |
2024 |
VLDB |
5.6997784e-05 |
| 5,275 |
Auto-Tables: Synthesizing Multi-Step Transformations to Relationalize Tables without Using Examples |
2023 |
VLDB |
5.5905507e-05 |
| 5,449 |
Transformers for Tabular Data Representation: A Tutorial on Models and Applications |
2022 |
VLDB |
5.5008652e-05 |
| 5,840 |
Logical and Physical Optimizations for SQL Query Execution over Large Language Models |
2025 |
SIGMOD |
5.3042561e-05 |
| 6,092 |
Observatory: Characterizing Embeddings of Relational Tables |
2024 |
VLDB |
5.2138566e-05 |
| 6,280 |
Self-supervised and Interpretable Data Cleaning with Sequence Generative Adversarial Networks |
2023 |
VLDB |
5.1290457e-05 |
| 6,775 |
A Unified Transferable Model for ML-Enhanced DBMS |
2022 |
CIDR |
4.9299192e-05 |
| 6,800 |
DTT: An Example-Driven Tabular Transformer for Joinability by Leveraging Large Language Models |
2024 |
SIGMOD |
4.9231471e-05 |
| 7,643 |
Cross Modal Data Discovery over Structured and Unstructured Data Lakes |
2023 |
VLDB |
4.6901105e-05 |
| 8,193 |
WarpGate: A Semantic Join Discovery System for Cloud Data Warehouses |
2023 |
CIDR |
4.5618596e-05 |
| 8,579 |
RECA: Related Tables Enhanced Column Semantic Type Annotation Framework |
2023 |
VLDB |
4.4922446e-05 |
| 8,712 |
ANN Softmax: Acceleration of Extreme Classification Training |
2022 |
VLDB |
4.4626362e-05 |
| 8,736 |
Unveiling Challenges for LLMs in Enterprise Data Engineering |
2026 |
VLDB |
4.456315e-05 |
| 8,743 |
CtxPipe: Context-aware Data Preparation Pipeline Construction for Machine Learning |
2024 |
SIGMOD |
4.456315e-05 |
| 8,847 |
Towards Foundation Database Models |
2025 |
CIDR |
4.4371897e-05 |
| 8,852 |
Watchog: A Light-weight Contrastive Learning based Framework for Column Annotation |
2023 |
SIGMOD |
4.4356508e-05 |
| 8,913 |
Making Table Understanding Work in Practice |
2022 |
CIDR |
4.427232e-05 |
| 9,077 |
VerifAI: Verified Generative AI |
2024 |
CIDR |
4.4010762e-05 |
| 9,348 |
GIDCL: A Graph-Enhanced Interpretable Data Cleaning Framework with Large Language Models |
2024 |
SIGMOD |
4.3526427e-05 |
| 9,399 |
TabulaX: Leveraging Large Language Models for Multi-Class Table Transformations |
2025 |
VLDB |
4.3441378e-05 |
| 9,479 |
Data Imputation with Limited Data Redundancy Using Data Lakes |
2025 |
VLDB |
4.3341665e-05 |
| 9,777 |
Data Augmentation for ML-driven Data Preparation and Integration |
2021 |
VLDB |
4.2856106e-05 |
| 9,961 |
QueryArtisan: Generating Data Manipulation Codes for Ad-hoc Analysis in Data Lakes |
2025 |
VLDB |
4.2294678e-05 |
| 10,059 |
Burr: A Benchmark for Ontology Learning from Relational Databases |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,109 |
Retrieve-and-Verify: A Table Context Selection Framework for Accurate Column Annotations |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,142 |
AutoDDG: Automated Dataset Description Generation using Large Language Models |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,197 |
Qualitative Join Discovery in Data Lakes using Examples |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,268 |
OpenSQL: Data-Efficient Text-to-SQL for Open-Source LLMs via Synthesized Intermediate Supervision |
2026 |
VLDB |
4.1945683e-05 |
| 10,498 |
PLM4NDV: Minimizing Data Access for Number of Distinct Values Estimation with Pre-trained Language Models |
2025 |
SIGMOD |
4.1945683e-05 |
| 10,510 |
Table Overlap Estimation through Graph Embeddings |
2025 |
SIGMOD |
4.1945683e-05 |
| 10,589 |
Birdie: Natural Language-Driven Table Discovery Using Differentiable Search Index |
2025 |
VLDB |
4.1945683e-05 |
| 10,753 |
Cents: A Flexible and Cost-Effective Framework for LLM-Based Table Understanding |
2025 |
VLDB |
4.1945683e-05 |
| 10,754 |
OmniMatch: Joinability Discovery in Data Products |
2025 |
VLDB |
4.1945683e-05 |
| 10,844 |
Panel on Neural Relational Data: Tabular Foundation Models, LLMs... or both? |
2025 |
VLDB |
4.1945683e-05 |
| 10,951 |
Determining the Largest Overlap between Tables |
2024 |
SIGMOD |
4.1945683e-05 |
| 10,973 |
Unstructured Data Fusion for Schema and Data Extraction |
2024 |
SIGMOD |
4.1945683e-05 |