Back to papers
From BERT to GPT-3 Codex: Harnessing the Potential of Very Large Language Models for Data Management
Summary: Survey of very large language models (VLLMs) for data management, detailing usage in DB tasks with minimal fine-tuning. Explores Codex-style code generation, libraries/APIs, and LM-based architectures and integration with traditional systems.
(summarized by gpt-5-nano on Feb 09 2026)
- Paper ID
- 12885
- Venue
- VLDB
- Year
- 2022
- Pagerank
- 5.8198826e-05
- Overall Rank
- 4,934 | 65.68%
- DOI
-
10.14778/3554821.3554896
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 10 of 10 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
| 5,509 |
Can Large Language Models Predict Data Correlations from Column Names? |
2023 |
VLDB |
5.4703368e-05 |
| 6,737 |
Demonstrating GPT-DB: Generating Query-Specific and Customizable Code for SQL Processing with GPT-4 |
2023 |
VLDB |
4.9457488e-05 |
| 7,052 |
Pre-trained Embeddings for Entity Resolution: An Experimental Analysis |
2023 |
VLDB |
4.8497453e-05 |
| 7,152 |
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity |
2024 |
VLDB |
4.8154191e-05 |
| 8,186 |
E2ETune: End-to-End Knob Tuning via Fine-tuned Generative Language Model |
2025 |
VLDB |
4.5651684e-05 |
| 8,892 |
Generation of Training Examples for Tabular Natural Language Inference |
2023 |
SIGMOD |
4.4275457e-05 |
| 9,277 |
DBG-PT: A Large Language Model Assisted Query Performance Regression Debugger |
2024 |
VLDB |
4.3640804e-05 |
| 9,875 |
A Universal Question-Answering Platform for Knowledge Graphs |
2023 |
SIGMOD |
4.2667743e-05 |
| 10,835 |
Large Language Models for Spatial Analysis Queries |
2025 |
VLDB |
4.1945683e-05 |
| 11,058 |
LLM-PBE: Assessing Data Privacy in Large Language Models |
2024 |
VLDB |
4.1945683e-05 |
Outgoing Citations (Sorted by Pagerank)
Showing 18 of 18 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 567 |
NaLIR: An Interactive Natural Language Interface for Querying Relational Databases |
2014 |
SIGMOD |
0.00019966681 |
| 1,407 |
DB-BERT: A Database Tuning Tool that "Reads the Manual" |
2022 |
SIGMOD |
0.00012146739 |
| 1,643 |
CodexDB: Synthesizing Code for Query Processing from Natural Language Instructions using GPT-3 Codex |
2022 |
VLDB |
0.0001104256 |
| 2,057 |
From Natural Language Processing to Neural Databases |
2021 |
VLDB |
9.6624862e-05 |
| 2,349 |
RPT: Relational Pre-trained Transformer Is Almost All You Need towards Democratizing Data Preparation |
2021 |
VLDB |
8.9876423e-05 |
| 3,473 |
AI Meets Database: AI4DB and DB4AI |
2021 |
SIGMOD |
7.062864e-05 |
| 3,635 |
A Deep Dive into Deep Learning Approaches for Text-to-SQL Systems |
2021 |
SIGMOD |
6.8981006e-05 |
| 3,942 |
Ember: No-Code Context Enrichment via Similarity-Based Keyless Joins |
2022 |
VLDB |
6.6114622e-05 |
| 5,281 |
State of the Art and Open Challenges in Natural Language Interfaces to Data |
2020 |
SIGMOD |
5.5896272e-05 |
| 5,861 |
Machine Learning for Databases |
2021 |
VLDB |
5.298883e-05 |
| 6,228 |
Managing ML Pipelines: Feature Stores and the Coming Wave of Embedding Ecosystems |
2021 |
VLDB |
5.1470042e-05 |
| 6,268 |
Speedup Your Analytics: Automatic Parameter Tuning for Databases and Big Data Systems |
2019 |
VLDB |
5.133857e-05 |
| 6,456 |
From Auto-tuning One Size Fits All to Self-designed and Learned Data-intensive Systems |
2019 |
SIGMOD |
5.0564619e-05 |
| 7,655 |
Machine Learning for Cloud Data Systems: the Progress so far and the Path Forward |
2021 |
VLDB |
4.6872456e-05 |
| 8,346 |
Deep Learning: Systems and Responsibility |
2021 |
SIGMOD |
4.5420668e-05 |
| 9,136 |
TextCube: Automated Construction and Multidimensional Exploration |
2019 |
VLDB |
4.3881065e-05 |
| 9,137 |
Combating Fake News: A Data Management and Mining Perspective |
2019 |
VLDB |
4.3881065e-05 |
| 11,384 |
BABOONS: Black-Box Optimization of Data Summaries in Natural Language |
2022 |
VLDB |
4.1945683e-05 |
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 1,643 |
CodexDB: Synthesizing Code for Query Processing from Natural Language Instructions using GPT-3 Codex |
2022 |
VLDB |
0.0001104256 |
| 13,138 |
Database Perspective on LLM Inference Systems |
2025 |
VLDB |
- |
| 1,532 |
Data Management in Machine Learning: Challenges, Techniques, and Systems |
2017 |
SIGMOD |
0.00011472681 |
| 9,243 |
Demonstration of DB-GPT: Next Generation Data Interaction System Empowered by Large Language Models |
2024 |
VLDB |
4.3690661e-05 |
| 6,826 |
Natural Language Interfaces for Databases with Deep Learning |
2023 |
VLDB |
4.9142824e-05 |
| 5,449 |
Transformers for Tabular Data Representation: A Tutorial on Models and Applications |
2022 |
VLDB |
5.5008652e-05 |
| 13,173 |
Harmonizing ML and Databases: A Symphony of Data (VLDB 2024 Keynote) |
2024 |
VLDB |
- |
| 7,020 |
LLM for Data Management |
2024 |
VLDB |
4.8595728e-05 |
| 3,995 |
How Large Language Models Will Disrupt Data Management |
2023 |
VLDB |
6.5513237e-05 |
| 5,455 |
Natural Language Data Management and Interfaces: Recent Development and Open Challenges |
2017 |
SIGMOD |
5.4977219e-05 |