Database Paper Browser

Back to papers

OmniSQL: Synthesizing High-quality Text-to-SQL Data at Scale

Summary: Scalable synthesis framework producing SynSQL‑2.5M: 2.5M text-to-SQL samples across ~16k synthetic databases, each with DB, SQL, NL question, and chain-of-thought, addressing data scarcity and reliance on closed-source prompting. Trains OmniSQL (7B/14B/32B), open-source, matching or surpassing larger closed/open LLMs (e.g., GPT‑4o, DeepSeek‑V3). (summarized by gpt-5-mini on Feb 09 2026)

Paper ID
14078
Venue
VLDB
Year
2025
Pagerank
6.5725884e-05
Overall Rank
3,978 | 72.33%
DOI
10.14778/3749646.3749723

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 9 of 9 citing papers.

Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 8 of 8 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers