Database Paper Browser

Back to papers

GitTables: A Large-Scale Corpus of Relational Tables

Summary: GitTables provides ~1M relational tables from GitHub, with a path to 10M+, enabling training on offline, non-HTML relational data. Unique: semantic-type annotations from Schema.org/DBpedia enable learned type detection, schema completion, and table-to-KG benchmarks for data management. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
6533
Venue
SIGMOD
Year
2023
Pagerank
7.0131061e-05
Overall Rank
3,520 | 75.52%
DOI
10.1145/3588710

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 18 of 18 citing papers.

Rank Citing Paper Year Venue Pagerank
3,015 Chorus: Foundation Models for Unified Data Discovery and Exploration 2024 VLDB 7.7092391e-05
3,978 OmniSQL: Synthesizing High-quality Text-to-SQL Data at Scale 2025 VLDB 6.5725884e-05
5,099 ArcheType: A Novel Framework for Open-Source Column Type Annotation using Large Language Models 2024 VLDB 5.6997784e-05
5,928 SchemaPile: A Large Collection of Relational Database Schemas 2024 SIGMOD 5.2685946e-05
7,026 Mind the Data Gap: Bridging LLMs to Enterprise Data Integration 2025 CIDR 4.8570811e-05
8,204 ELEET: Efficient Learned Query Execution over Text and Tables 2024 VLDB 4.5594273e-05
8,736 Unveiling Challenges for LLMs in Enterprise Data Engineering 2026 VLDB 4.456315e-05
8,852 Watchog: A Light-weight Contrastive Learning based Framework for Column Annotation 2023 SIGMOD 4.4356508e-05
8,913 Making Table Understanding Work in Practice 2022 CIDR 4.427232e-05
9,928 Fainder: A Fast and Accurate Index for Distribution-Aware Dataset Search 2024 VLDB 4.2511622e-05
10,109 Retrieve-and-Verify: A Table Context Selection Framework for Accurate Column Annotations 2026 SIGMOD 4.1945683e-05
10,142 AutoDDG: Automated Dataset Description Generation using Large Language Models 2026 SIGMOD 4.1945683e-05
10,498 PLM4NDV: Minimizing Data Access for Number of Distinct Values Estimation with Pre-trained Language Models 2025 SIGMOD 4.1945683e-05
10,510 Table Overlap Estimation through Graph Embeddings 2025 SIGMOD 4.1945683e-05
10,534 AdaNDV: Adaptive Number of Distinct Value Estimation via Learning to Select and Fuse Estimators 2025 VLDB 4.1945683e-05
10,753 Cents: A Flexible and Cost-Effective Framework for LLM-Based Table Understanding 2025 VLDB 4.1945683e-05
10,951 Determining the Largest Overlap between Tables 2024 SIGMOD 4.1945683e-05
11,205 Steered Training Data Generation for Learned Semantic Type Detection 2023 SIGMOD 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 4 of 4 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
107 WebTables: Exploring the Power of Tables on the Web 2008 VLDB 0.00048377684
513 TURL: Table Understanding through Representation Learning 2021 VLDB 0.00021288342
3,155 Ten Years of WebTables 2018 VLDB 7.4672742e-05
4,630 Knowledge Graphs 2021: A Data Odyssey 2021 VLDB 6.0348379e-05
Previous Page 1 / 1 Next

Semantically Similar Papers