Back to papers
How Large Language Models Will Disrupt Data Management
Summary: LLMs provide semantic grounding of tuples, schemas, and queries, enabling automation breakthroughs in tasks that stalled (entity resolution, schema matching, data discovery, query synthesis). They also blur predictive models and IR, prompting new DB/architecture designs.
(summarized by gpt-5-mini on Feb 09 2026)
- Paper ID
- 13166
- Venue
- VLDB
- Year
- 2023
- Pagerank
- 6.5513237e-05
- Overall Rank
- 3,995 | 72.21%
- DOI
-
10.14778/3611479.3611527
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 12 of 12 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
| 1,963 |
DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing |
2025 |
VLDB |
9.929429e-05 |
| 2,587 |
Table-GPT: Table Fine-tuned GPT for Diverse Table Tasks |
2024 |
SIGMOD |
8.4924618e-05 |
| 3,508 |
spade: Synthesizing Data Quality Assertions for Large Language Model Pipelines |
2024 |
VLDB |
7.0271496e-05 |
| 5,658 |
Databases Unbound: Querying All of the World's Bytes with AI |
2024 |
VLDB |
5.385675e-05 |
| 5,928 |
SchemaPile: A Large Collection of Relational Database Schemas |
2024 |
SIGMOD |
5.2685946e-05 |
| 9,032 |
Sphinteract: Resolving Ambiguities in NL2SQL Through User Interaction |
2025 |
VLDB |
4.4039656e-05 |
| 9,991 |
The Pneuma Project: Reifying Information Needs as Relational Schemas to Automate Discovery, Guide Preparation, and Align Data with Intent |
2026 |
CIDR |
4.1945683e-05 |
| 10,064 |
Cut Costs, Not Accuracy: LLM-Powered Data Processing with Guarantees |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,595 |
Optimized Batch Prompting for Cost-effective LLMs |
2025 |
VLDB |
4.1945683e-05 |
| 10,658 |
LLMLog: Advanced Log Template Generation via LLM-driven Multi-Round Annotation |
2025 |
VLDB |
4.1945683e-05 |
| 10,835 |
Large Language Models for Spatial Analysis Queries |
2025 |
VLDB |
4.1945683e-05 |
| 11,058 |
LLM-PBE: Assessing Data Privacy in Large Language Models |
2024 |
VLDB |
4.1945683e-05 |
Outgoing Citations (Sorted by Pagerank)
Showing 22 of 22 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 31 |
Provenance Semirings |
2007 |
PODS |
0.0007857786 |
| 221 |
Deep Entity Matching with Pre-Trained Language Models |
2021 |
VLDB |
0.00033121824 |
| 513 |
TURL: Table Understanding through Representation Learning |
2021 |
VLDB |
0.00021288342 |
| 517 |
Can Foundation Models Wrangle Your Data? |
2023 |
VLDB |
0.00021169035 |
| 518 |
Data Integration for the Relational Web |
2009 |
VLDB |
0.00021158934 |
| 567 |
NaLIR: An Interactive Natural Language Interface for Querying Relational Databases |
2014 |
SIGMOD |
0.00019966681 |
| 667 |
Incremental Knowledge Base Construction Using DeepDive |
2015 |
VLDB |
0.00018440557 |
| 893 |
Data Integration: The Teenage Years |
2006 |
VLDB |
0.00015558352 |
| 1,147 |
Web-scale Data Integration: You can only afford to Pay As You Go |
2007 |
CIDR |
0.00013677658 |
| 1,407 |
DB-BERT: A Database Tuning Tool that "Reads the Manual" |
2022 |
SIGMOD |
0.00012146739 |
| 1,643 |
CodexDB: Synthesizing Code for Query Processing from Natural Language Instructions using GPT-3 Codex |
2022 |
VLDB |
0.0001104256 |
| 2,152 |
MISTIQUE: A System to Store and Query Model Intermediates for Model Diagnosis |
2018 |
SIGMOD |
9.4239787e-05 |
| 2,352 |
MiCS: Near-linear Scaling for Training Gigantic Model on Public Cloud |
2023 |
VLDB |
8.9766205e-05 |
| 2,888 |
Sato: Contextual Semantic Type Detection in Tables |
2020 |
VLDB |
7.9594996e-05 |
| 2,902 |
PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel |
2023 |
VLDB |
7.93939e-05 |
| 3,617 |
Ava: From Data to Insights Through Conversation |
2017 |
CIDR |
6.9091789e-05 |
| 4,180 |
FastFlow: Accelerating Deep Learning Model Training with Smart Offloading of Input Data Pipeline |
2023 |
VLDB |
6.3793352e-05 |
| 4,630 |
Knowledge Graphs 2021: A Data Odyssey |
2021 |
VLDB |
6.0348379e-05 |
| 4,967 |
Leva: Boosting Machine Learning Performance with Relational Embedding Data Augmentation |
2022 |
SIGMOD |
5.7956612e-05 |
| 6,377 |
Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism |
2023 |
VLDB |
5.0911095e-05 |
| 7,868 |
Solo: Data Discovery Using Natural Language Questions Via A Self-Supervised Approach |
2023 |
SIGMOD |
4.6319504e-05 |
| 8,615 |
The Case for NLP-Enhanced Database Tuning: Towards Tuning Tools that "Read the Manual" |
2021 |
VLDB |
4.484683e-05 |
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 13,146 |
Turning Databases Into Generative AI Machines |
2024 |
CIDR |
- |
| 10,844 |
Panel on Neural Relational Data: Tabular Foundation Models, LLMs... or both? |
2025 |
VLDB |
4.1945683e-05 |
| 2,988 |
NL2SQL is a solved problem... Not! |
2024 |
CIDR |
7.7761714e-05 |
| 5,658 |
Databases Unbound: Querying All of the World's Bytes with AI |
2024 |
VLDB |
5.385675e-05 |
| 4,934 |
From BERT to GPT-3 Codex: Harnessing the Potential of Very Large Language Models for Data Management |
2022 |
VLDB |
5.8198826e-05 |
| 1,116 |
Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes |
2024 |
VLDB |
0.00013890154 |
| 13,138 |
Database Perspective on LLM Inference Systems |
2025 |
VLDB |
- |
| 9,243 |
Demonstration of DB-GPT: Next Generation Data Interaction System Empowered by Large Language Models |
2024 |
VLDB |
4.3690661e-05 |
| 8,736 |
Unveiling Challenges for LLMs in Enterprise Data Engineering |
2026 |
VLDB |
4.456315e-05 |
| 7,020 |
LLM for Data Management |
2024 |
VLDB |
4.8595728e-05 |