Database Paper Browser

Back to papers

SchemaPile: A Large Collection of Relational Database Schemas

Summary: SchemaPile: massive GitHub-mined corpus of 221K relational database schemas, 1.7M tables, 10M columns, 700K FKs, and rich integrity metadata/content — far beyond single-table corpora. Positions schema-level training/evaluation data for LLMs and data management tasks like FK detection, header detection, and SQL parsing. (summarized by gpt-5.4-mini on May 24 2026)

Paper ID
6935
Venue
SIGMOD
Year
2024
Pagerank
5.2685946e-05
Overall Rank
5,928 | 58.77%
DOI
10.1145/3654975

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 3 of 3 citing papers.

Rank Citing Paper Year Venue Pagerank
3,978 OmniSQL: Synthesizing High-quality Text-to-SQL Data at Scale 2025 VLDB 6.5725884e-05
5,437 SNAILS: Schema Naming Assessments for Improved LLM-Based SQL Inference 2025 SIGMOD 5.5033018e-05
10,197 Qualitative Join Discovery in Data Lakes using Examples 2026 SIGMOD 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 15 of 15 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers