Mind the Data Gap: Bridging LLMs to Enterprise Data Integration
Summary: LLM-based integration methods trained on public corpora falter on enterprise “dark” data; current public benchmarks overestimate real-world performance. Presents the Goby Benchmark and three remedies—hierarchical annotation, runtime class-learning, and ontology synthesis—that restore LLM performance on enterprise integration to parity with public-data scenarios. (summarized by gpt-5-mini on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Moe Kayali
- 2. Fabian Wenz
- 3. Nesime Tatbul
- 4. Çağatay Demiralp
Incoming Citations (Sorted by Pagerank)
Showing 2 of 2 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 8,736 | Unveiling Challenges for LLMs in Enterprise Data Engineering | 2026 | VLDB | 4.456315e-05 |
| 9,977 | A Vision for Autonomous Data Agent Collaboration: From Query-by-Integration to Query-by-Collaboration | 2026 | CIDR | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 8 of 8 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 107 | WebTables: Exploring the Power of Tables on the Web | 2008 | VLDB | 0.00048377684 |
| 489 | Data Curation at Scale: The Data Tamer System | 2013 | CIDR | 0.00022030728 |
| 517 | Can Foundation Models Wrangle Your Data? | 2023 | VLDB | 0.00021169035 |
| 1,116 | Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes | 2024 | VLDB | 0.00013890154 |
| 2,517 | Annotating Columns with Pre-trained Language Models | 2022 | SIGMOD | 8.6092139e-05 |
| 3,015 | Chorus: Foundation Models for Unified Data Discovery and Exploration | 2024 | VLDB | 7.7092391e-05 |
| 3,520 | GitTables: A Large-Scale Corpus of Relational Tables | 2023 | SIGMOD | 7.0131061e-05 |
| 6,890 | Towards NLP-Enhanced Data Profiling Tools | 2022 | CIDR | 4.8928923e-05 |
Previous
Page 1 / 1
Next