CodexDB: Synthesizing Code for Query Processing from Natural Language Instructions using GPT-3 Codex

Summary: CodexDB leverages GPT-3 Codex to synthesize query-processing code from natural-language instructions. Decomposing complex SQL into stepwise, NL-described processing stages augmented by user guidance and database properties; prototype attains 81% WikiSQL and 62% SPIDER. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID: 12776
Venue: VLDB
Year: 2022
Pagerank: 0.00011071662
Overall Rank: 1,631 | 88.67%
DOI: 10.14778/3551793.3551841

Incoming Non-self Citations Over Time

Authors

1. Immanuel Trummer

Incoming Citations (Sorted by Pagerank)

Showing 12 of 12 citing papers.

Rank	Citing Paper	Year	Venue	Pagerank
366	Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation	2024	VLDB	0.00025580097
1,088	Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes	2024	VLDB	0.00014158762
3,982	How Large Language Models Will Disrupt Data Management	2023	VLDB	6.5595332e-05
4,916	From BERT to GPT-3 Codex: Harnessing the Potential of Very Large Language Models for Data Management	2022	VLDB	5.827729e-05
5,506	Can Large Language Models Predict Data Correlations from Column Names?	2023	VLDB	5.4711611e-05
6,712	Demonstrating GPT-DB: Generating Query-Specific and Customizable Code for SQL Processing with GPT-4	2023	VLDB	4.9474017e-05
7,676	E2ETune: End-to-End Knob Tuning via Fine-tuned Generative Language Model	2025	VLDB	4.6770108e-05
7,887	SQLStorm: Taking Database Benchmarking into the LLM Era	2025	VLDB	4.6218382e-05
9,005	Demonstrating SQLBarber: Leveraging Large Language Models to Generate Customized and Realistic SQL Workloads	2025	SIGMOD	4.407457e-05
9,960	QueryArtisan: Generating Data Manipulation Codes for Ad-hoc Analysis in Data Lakes	2025	VLDB	4.2254157e-05
10,093	MCTuner: Spatial Decomposition-Enhanced Database Tuning via LLM-Guided Exploration	2026	SIGMOD	4.1905499e-05
10,212	SQLBarber: A System Leveraging Large Language Models to Generate Customized and Realistic SQL Workloads	2026	SIGMOD	4.1905499e-05

Outgoing Citations (Sorted by Pagerank)

Showing 9 of 9 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank	Cited Paper	Year	Venue	Pagerank
1	Access Path Selection in a Relational Database Management System	1979	SIGMOD	0.0040465394
71	How Good Are Query Optimizers, Really?	2016	VLDB	0.00059446482
101	The Case for Learned Index Structures	2018	SIGMOD	0.00049778866
203	Learned Cardinalities: Estimating Correlated Joins with Deep Learning	2019	CIDR	0.00034868567
329	Neo: A Learned Query Optimizer	2019	VLDB	0.00027301488
535	ATHENA: An Ontology-Driven System for Natural Language Querying over Relational Data Stores	2016	VLDB	0.00020718836
565	NaLIR: An Interactive Natural Language Interface for Querying Relational Databases	2014	SIGMOD	0.00019945986
2,348	RPT: Relational Pre-trained Transformer Is Almost All You Need towards Democratizing Data Preparation	2021	VLDB	8.9903659e-05
3,765	Ember: No-Code Context Enrichment via Similarity-Based Keyless Joins	2022	VLDB	6.7760748e-05

Semantically Similar Papers

Overall Rank	Paper	Year	Venue	Pagerank
9,250	Demonstration of DB-GPT: Next Generation Data Interaction System Empowered by Large Language Models	2024	VLDB	4.3648789e-05
10,221	NL2SQLBench: A Modular Benchmarking Framework for LLM-Enabled NL2SQL Solutions	2026	VLDB	4.1905499e-05
2,435	ScienceBenchmark: A Complex Real-World Benchmark for Evaluating Natural Language to SQL Systems	2024	VLDB	8.8218963e-05
3,978	OmniSQL: Synthesizing High-quality Text-to-SQL Data at Scale	2025	VLDB	6.5662694e-05
10,268	OpenSQL: Data-Efficient Text-to-SQL for Open-Source LLMs via Synthesized Intermediate Supervision	2026	VLDB	4.1905499e-05
4,281	GenEdit: Compounding Operators and Continuous Improvement to Tackle Text-to-SQL in the Enterprise	2025	CIDR	6.2824978e-05
10,108	Reliable Answers for Recurring Questions: Boosting Text-to-SQL Accuracy with Template Constrained Decoding	2026	SIGMOD	4.1905499e-05
4,916	From BERT to GPT-3 Codex: Harnessing the Potential of Very Large Language Models for Data Management	2022	VLDB	5.827729e-05
6,712	Demonstrating GPT-DB: Generating Query-Specific and Customizable Code for SQL Processing with GPT-4	2023	VLDB	4.9474017e-05
998	CodeS: Towards Building Open-source Language Models for Text-to-SQL	2024	SIGMOD	0.00014726344