Database Paper Browser

Back to papers

SANTOS: Relationship-based Semantic Table Union Search

Summary: SANTOS: relationship-based semantic table union search; unionability via inter-column relationships and column semantics. Two methods: KB-driven relation extraction and a data-lake synthesized KB; outperforms column-based union search on open benchmarks. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
6512
Venue
SIGMOD
Year
2023
Pagerank
7.7462128e-05
Overall Rank
3,000 | 79.14%
DOI
10.1145/3588689

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 22 of 22 citing papers.

Rank Citing Paper Year Venue Pagerank
2,836 Semantics-aware Dataset Discovery from Data Lakes with Contextualized Column-based Representation Learning 2023 VLDB 8.0443826e-05
3,015 Chorus: Foundation Models for Unified Data Discovery and Exploration 2024 VLDB 7.7092391e-05
4,859 Integrating Data Lake Tables 2023 VLDB 5.8732433e-05
5,099 ArcheType: A Novel Framework for Open-Source Column Type Annotation using Large Language Models 2024 VLDB 5.6997784e-05
7,048 Magneto: Combining Small and Large Language Models for Schema Matching 2025 VLDB 4.8520651e-05
7,582 LakeCompass: An End-to-End System for Data Maintenance, Search and Analysis in Data Lakes 2024 VLDB 4.7046388e-05
8,116 LakeBench: A Benchmark for Discovering Joinable and Unionable Tables in Data Lakes 2024 VLDB 4.581507e-05
8,910 R2D2: Reducing Redundancy and Duplication in Data Lakes 2023 SIGMOD 4.427232e-05
10,109 Retrieve-and-Verify: A Table Context Selection Framework for Accurate Column Annotations 2026 SIGMOD 4.1945683e-05
10,142 AutoDDG: Automated Dataset Description Generation using Large Language Models 2026 SIGMOD 4.1945683e-05
10,197 Qualitative Join Discovery in Data Lakes using Examples 2026 SIGMOD 4.1945683e-05
10,510 Table Overlap Estimation through Graph Embeddings 2025 SIGMOD 4.1945683e-05
10,589 Birdie: Natural Language-Driven Table Discovery Using Differentiable Search Index 2025 VLDB 4.1945683e-05
10,628 CatDB: Data-catalog-guided, LLM-based Generation of Data-centric ML Pipelines 2025 VLDB 4.1945683e-05
10,645 OpenForge: Probabilistic Metadata Integration 2025 VLDB 4.1945683e-05
10,685 LakeVisage: Towards Scalable, Flexible and Interactive Visualization Recommendation for Data Discovery over Data Lakes 2025 VLDB 4.1945683e-05
10,754 OmniMatch: Joinability Discovery in Data Products 2025 VLDB 4.1945683e-05
10,823 TableCopilot: A Table Assistant Empowered by Natural Language Conditional Table Discovery 2025 VLDB 4.1945683e-05
10,973 Unstructured Data Fusion for Schema and Data Extraction 2024 SIGMOD 4.1945683e-05
11,054 Enriching Relations with Additional Attributes for ER 2024 VLDB 4.1945683e-05
11,063 Searching Data Lakes for Nested and Joined Data 2024 VLDB 4.1945683e-05
11,097 Navigating Data Repositories: Utilizing Line Charts to Discover Relevant Datasets 2024 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 19 of 19 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
364 Annotating and Searching Web Tables Using Entities, Types and Relationships 2010 VLDB 0.00025637562
428 Latent Semantic Indexing: A Probabilistic Analysis 1998 PODS 0.00023512226
513 TURL: Table Understanding through Representation Learning 2021 VLDB 0.00021288342
518 Data Integration for the Relational Web 2009 VLDB 0.00021158934
818 Finding Related Tables 2012 SIGMOD 0.00016311524
939 Data Lake Management: Challenges and Opportunities 2019 VLDB 0.00015187344
1,001 Recovering Semantics of Tables on the Web 2011 VLDB 0.00014706505
1,178 Table Union Search on Open Data 2018 VLDB 0.00013468118
1,644 Finding Related Tables in Data Lakes for Interactive Data Science 2020 SIGMOD 0.00011041787
1,914 Creating Embeddings of Heterogeneous Relational Datasets for Data Integration Tasks 2020 SIGMOD 0.00010109102
2,141 LSH Ensemble: Internet-Scale Domain Search 2016 VLDB 9.4542625e-05
2,517 Annotating Columns with Pre-trained Language Models 2022 SIGMOD 8.6092139e-05
2,633 Schema Extraction for Tabular Data on the Web 2013 VLDB 8.4063569e-05
2,730 Open Data Integration 2018 VLDB 8.2126735e-05
2,888 Sato: Contextual Semantic Type Detection in Tables 2020 VLDB 7.9594996e-05
3,797 Stitching Web Tables for Improving Matching Quality 2017 VLDB 6.7597149e-05
4,801 CLAMS: Bringing Quality to Data Lakes 2016 SIGMOD 5.9115269e-05
5,529 Data-Driven Domain Discovery for Structured Datasets 2020 VLDB 5.4566641e-05
8,787 QuTE: Answering Quantity Queries from Web Tables 2021 SIGMOD 4.4520613e-05
Previous Page 1 / 1 Next

Semantically Similar Papers