Database Paper Browser

Back to papers

CodeS: Towards Building Open-source Language Models for Text-to-SQL

Summary: Open-source text-to-SQL LMs (1B–15B) that beat closed LLM SOTA, with SQL-centric incremental pretraining. Key tricks: schema-linking/domain-adaptation via prompt design + bidirectional augmentation; strong gains on Spider, BIRD, robustness, and real-world datasets. (summarized by gpt-5.4-mini on May 24 2026)

Paper ID
6892
Venue
SIGMOD
Year
2024
Pagerank
0.00014729379
Overall Rank
998 | 93.06%
DOI
10.1145/3654930

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 32 of 32 citing papers.

Rank Citing Paper Year Venue Pagerank
3,662 The Dawn of Natural Language to SQL: Are We Fully Ready? 2024 VLDB 6.8672143e-05
3,859 OpenSearch-SQL: Enhancing Text-to-SQL with Dynamic Few-shot and Consistency Alignment 2025 SIGMOD 6.6907933e-05
3,978 OmniSQL: Synthesizing High-quality Text-to-SQL Data at Scale 2025 VLDB 6.5725884e-05
4,289 GenEdit: Compounding Operators and Continuous Improvement to Tackle Text-to-SQL in the Enterprise 2025 CIDR 6.2885419e-05
4,908 Combining Small Language Models and Large Language Models for Zero-Shot NL2SQL 2024 VLDB 5.8339245e-05
5,437 SNAILS: Schema Naming Assessments for Improved LLM-Based SQL Inference 2025 SIGMOD 5.5033018e-05
7,139 Automated Validating and Fixing of Text-to-SQL Translation with Execution Consistency 2025 SIGMOD 4.821174e-05
7,354 Reliable Text-to-SQL with Adaptive Abstention 2025 SIGMOD 4.7529612e-05
8,186 E2ETune: End-to-End Knob Tuning via Fine-tuned Generative Language Model 2025 VLDB 4.5651684e-05
8,896 SQL-Factory: A Multi-Agent Framework for High-Quality and Large-Scale SQL Generation 2026 VLDB 4.427232e-05
9,151 The Power of Constraints in Natural Language to SQL Translation 2025 VLDB 4.3849295e-05
9,392 Demonstrating SQLBarber: Leveraging Large Language Models to Generate Customized and Realistic SQL Workloads 2025 SIGMOD 4.3441378e-05
10,047 AgentTune: An Agent-Based Large Language Model Framework for Database Knob Tuning 2026 SIGMOD 4.1945683e-05
10,051 Are Your LLM-based Text-to-SQL Models Secure? Exploring SQL Injection via Backdoor Attacks 2026 SIGMOD 4.1945683e-05
10,093 MCTuner: Spatial Decomposition-Enhanced Database Tuning via LLM-Guided Exploration 2026 SIGMOD 4.1945683e-05
10,099 PLForge: Enhancing Language Models for Natural Language to Procedural Extensions of SQL 2026 SIGMOD 4.1945683e-05
10,108 Reliable Answers for Recurring Questions: Boosting Text-to-SQL Accuracy with Template Constrained Decoding 2026 SIGMOD 4.1945683e-05
10,117 AixelAsk: A Stepwise-Guided Retrieval and Reasoning Framework for Large Table QA 2026 SIGMOD 4.1945683e-05
10,155 DIVER: A Robust Text-to-SQL System with Dynamic Interactive Value Linking and Evidence Reasoning 2026 SIGMOD 4.1945683e-05
10,194 PRISM: Navigating Cost–Accuracy Trade-offs for NL2SQL 2026 SIGMOD 4.1945683e-05
10,210 SchemaRAG: A Schema-aware Retrieval-Augmented Generation Framework for Text-to-SQL 2026 SIGMOD 4.1945683e-05
10,212 SQLBarber: A System Leveraging Large Language Models to Generate Customized and Realistic SQL Workloads 2026 SIGMOD 4.1945683e-05
10,221 NL2SQLBench: A Modular Benchmarking Framework for LLM-Enabled NL2SQL Solutions 2026 VLDB 4.1945683e-05
10,249 TACO: A Benchmark for Open-Domain Text-to-SQL with Ambiguous and Cross-Database Queries 2026 VLDB 4.1945683e-05
10,268 OpenSQL: Data-Efficient Text-to-SQL for Open-Source LLMs via Synthesized Intermediate Supervision 2026 VLDB 4.1945683e-05
10,320 ELT-Bench: An End-to-End Benchmark for Evaluating AI Agents on ELT Pipelines 2026 VLDB 4.1945683e-05
10,327 Pervasive Annotation Errors Break Text-to-SQL Benchmarks and Leaderboards 2026 VLDB 4.1945683e-05
10,451 RTS+: Reliable Text to SQL 2025 SIGMOD 4.1945683e-05
10,693 Evoschema: Towards Text-To-Sql Robustness Against Schema Evolution 2025 VLDB 4.1945683e-05
10,768 SiriusBI: A Comprehensive LLM-Powered Solution for Data Analytics in Business Intelligence 2025 VLDB 4.1945683e-05
10,784 Towards Automated Cross-domain Exploratory Data Analysis through Large Language Models 2025 VLDB 4.1945683e-05
10,837 Natural Language to SQL: State of the Art and Open Problems 2025 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 4 of 4 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers