Can Foundation Models Wrangle Your Data?
Summary: Demonstrates that large foundation models, via prompting without task-specific fine-tuning, can generalize to five classical data cleaning and integration tasks and achieve state-of-the-art performance. Identifies limits on private/domain data and integration challenges for DM systems. (summarized by gpt-5-mini on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Avanika Narayan
- 2. Ines Chami
- 3. Laurel Orr
- 4. Christopher RĂ©
Incoming Citations (Sorted by Pagerank)
Showing 5 of 55 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 10,973 | Unstructured Data Fusion for Schema and Data Extraction | 2024 | SIGMOD | 4.1945683e-05 |
| 11,047 | Blocker and Matcher Can Mutually Benefit: A Co-Learning Framework for Low-Resource Entity Resolution | 2024 | VLDB | 4.1945683e-05 |
| 11,054 | Enriching Relations with Additional Attributes for ER | 2024 | VLDB | 4.1945683e-05 |
| 11,137 | Generalizable Data Cleaning of Tabular Data in Latent Space | 2024 | VLDB | 4.1945683e-05 |
| 11,297 | DataRinse: Semantic Transforms for Data preparation based on Code Mining | 2023 | VLDB | 4.1945683e-05 |
Outgoing Citations (Sorted by Pagerank)
Showing 15 of 15 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 10,595 | Optimized Batch Prompting for Cost-effective LLMs | 2025 | VLDB | 4.1945683e-05 |
| 1,116 | Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes | 2024 | VLDB | 0.00013890154 |
| 10,973 | Unstructured Data Fusion for Schema and Data Extraction | 2024 | SIGMOD | 4.1945683e-05 |
| 7,026 | Mind the Data Gap: Bridging LLMs to Enterprise Data Integration | 2025 | CIDR | 4.8570811e-05 |
| 7,020 | LLM for Data Management | 2024 | VLDB | 4.8595728e-05 |
| 3,840 | Revisiting Prompt Engineering via Declarative Crowdsourcing | 2024 | CIDR | 6.7106924e-05 |
| 9,515 | Automating the Enterprise with Foundation Models | 2024 | VLDB | 4.3335877e-05 |
| 3,015 | Chorus: Foundation Models for Unified Data Discovery and Exploration | 2024 | VLDB | 7.7092391e-05 |
| 11,317 | Data Management Opportunities for Foundation Models | 2022 | CIDR | 4.1945683e-05 |
| 8,847 | Towards Foundation Database Models | 2025 | CIDR | 4.4371897e-05 |