Unveiling Challenges for LLMs in Enterprise Data Engineering

Summary: Identifies enterprise-specific obstacles for LLM-driven tabular data engineering—large tables, more complex tasks, and dependence on internal background knowledge. Systematic evaluation shows substantial accuracy degradation and practical limits of current LLMs in real-world enterprise settings. (summarized by gpt-5-mini on Mar 13 2026)

Paper ID: 14325
Venue: VLDB
Year: 2026
Pagerank: 4.4520434e-05
Overall Rank: 8,732 | 39.32%
DOI: 10.14778/3773749.3773758

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 1 of 1 citing papers.

Rank	Citing Paper	Year	Venue	Pagerank
10,183	Mixtera: A Data Plane for Foundation Model Training	2026	SIGMOD	4.1905499e-05

Outgoing Citations (Sorted by Pagerank)

Showing 17 of 17 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank	Cited Paper	Year	Venue	Pagerank
293	Deep Learning for Entity Matching: A Design Space Exploration	2018	SIGMOD	0.00028661817
366	Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation	2024	VLDB	0.00025580097
514	TURL: Table Understanding through Representation Learning	2021	VLDB	0.00021280726
516	Can Foundation Models Wrangle Your Data?	2023	VLDB	0.00021194444
915	On Schema Matching with Opaque Column Names and Data Values	2003	SIGMOD	0.00015362622
997	CAESURA: Language Models as Multi-Modal Query Planners	2024	CIDR	0.00014726927
1,185	JOSIE: Overlap Set Similarity Search for Finding Joinable Tables in Data Lakes	2019	SIGMOD	0.00013432692
2,013	Palimpzest: Optimizing AI-Powered Analytics with Declarative Query Processing	2025	CIDR	9.7986166e-05
2,585	Table-GPT: Table Fine-tuned GPT for Diverse Table Tasks	2024	SIGMOD	8.4909917e-05
2,895	Sato: Contextual Semantic Type Detection in Tables	2020	VLDB	7.9539265e-05
3,003	Chorus: Foundation Models for Unified Data Discovery and Exploration	2024	VLDB	7.7358219e-05
3,520	GitTables: A Large-Scale Corpus of Relational Tables	2023	SIGMOD	7.0136102e-05
4,462	Magellan: Toward Building Entity Matching Management Systems over Data Science Stacks	2016	VLDB	6.1566477e-05
7,027	Mind the Data Gap: Bridging LLMs to Enterprise Data Integration	2025	CIDR	4.8524216e-05
7,045	Magneto: Combining Small and Large Language Models for Schema Matching	2025	VLDB	4.8474104e-05
8,054	Generating Succinct Descriptions of Database Schemata for Cost-Efficient Prompting of Large Language Models	2024	VLDB	4.5909042e-05
9,516	Automating the Enterprise with Foundation Models	2024	VLDB	4.3294347e-05

Semantically Similar Papers

Overall Rank	Paper	Year	Venue	Pagerank
2,987	NL2SQL is a solved problem... Not!	2024	CIDR	7.77529e-05
366	Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation	2024	VLDB	0.00025580097
8,970	DataLoom: Simplifying Data Loading with LLMs	2024	VLDB	4.4154448e-05
10,462	ScaleLLM: A Technique for Scalable LLM-augmented Data Systems	2025	SIGMOD	4.1905499e-05
1,088	Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes	2024	VLDB	0.00014158762
10,022	In-context Clustering-based Entity Resolution with Large Language Models: A Design Space Exploration	2026	SIGMOD	4.1905499e-05
3,803	Revisiting Prompt Engineering via Declarative Crowdsourcing	2024	CIDR	6.7498941e-05
7,027	Mind the Data Gap: Bridging LLMs to Enterprise Data Integration	2025	CIDR	4.8524216e-05
3,982	How Large Language Models Will Disrupt Data Management	2023	VLDB	6.5595332e-05
7,016	LLM for Data Management	2024	VLDB	4.8561622e-05