Back to papers
SANTOS: Relationship-based Semantic Table Union Search
Summary: SANTOS: relationship-based semantic table union search; unionability via inter-column relationships and column semantics. Two methods: KB-driven relation extraction and a data-lake synthesized KB; outperforms column-based union search on open benchmarks.
(summarized by gpt-5-nano on Feb 09 2026)
- Paper ID
- 6512
- Venue
- SIGMOD
- Year
- 2023
- Pagerank
- 7.7462128e-05
- Overall Rank
- 3,000 | 79.14%
- DOI
-
10.1145/3588689
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 22 of 22 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
| 2,836 |
Semantics-aware Dataset Discovery from Data Lakes with Contextualized Column-based Representation Learning |
2023 |
VLDB |
8.0443826e-05 |
| 3,015 |
Chorus: Foundation Models for Unified Data Discovery and Exploration |
2024 |
VLDB |
7.7092391e-05 |
| 4,859 |
Integrating Data Lake Tables |
2023 |
VLDB |
5.8732433e-05 |
| 5,099 |
ArcheType: A Novel Framework for Open-Source Column Type Annotation using Large Language Models |
2024 |
VLDB |
5.6997784e-05 |
| 7,048 |
Magneto: Combining Small and Large Language Models for Schema Matching |
2025 |
VLDB |
4.8520651e-05 |
| 7,582 |
LakeCompass: An End-to-End System for Data Maintenance, Search and Analysis in Data Lakes |
2024 |
VLDB |
4.7046388e-05 |
| 8,116 |
LakeBench: A Benchmark for Discovering Joinable and Unionable Tables in Data Lakes |
2024 |
VLDB |
4.581507e-05 |
| 8,910 |
R2D2: Reducing Redundancy and Duplication in Data Lakes |
2023 |
SIGMOD |
4.427232e-05 |
| 10,109 |
Retrieve-and-Verify: A Table Context Selection Framework for Accurate Column Annotations |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,142 |
AutoDDG: Automated Dataset Description Generation using Large Language Models |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,197 |
Qualitative Join Discovery in Data Lakes using Examples |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,510 |
Table Overlap Estimation through Graph Embeddings |
2025 |
SIGMOD |
4.1945683e-05 |
| 10,589 |
Birdie: Natural Language-Driven Table Discovery Using Differentiable Search Index |
2025 |
VLDB |
4.1945683e-05 |
| 10,628 |
CatDB: Data-catalog-guided, LLM-based Generation of Data-centric ML Pipelines |
2025 |
VLDB |
4.1945683e-05 |
| 10,645 |
OpenForge: Probabilistic Metadata Integration |
2025 |
VLDB |
4.1945683e-05 |
| 10,685 |
LakeVisage: Towards Scalable, Flexible and Interactive Visualization Recommendation for Data Discovery over Data Lakes |
2025 |
VLDB |
4.1945683e-05 |
| 10,754 |
OmniMatch: Joinability Discovery in Data Products |
2025 |
VLDB |
4.1945683e-05 |
| 10,823 |
TableCopilot: A Table Assistant Empowered by Natural Language Conditional Table Discovery |
2025 |
VLDB |
4.1945683e-05 |
| 10,973 |
Unstructured Data Fusion for Schema and Data Extraction |
2024 |
SIGMOD |
4.1945683e-05 |
| 11,054 |
Enriching Relations with Additional Attributes for ER |
2024 |
VLDB |
4.1945683e-05 |
| 11,063 |
Searching Data Lakes for Nested and Joined Data |
2024 |
VLDB |
4.1945683e-05 |
| 11,097 |
Navigating Data Repositories: Utilizing Line Charts to Discover Relevant Datasets |
2024 |
VLDB |
4.1945683e-05 |
Outgoing Citations (Sorted by Pagerank)
Showing 19 of 19 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 364 |
Annotating and Searching Web Tables Using Entities, Types and Relationships |
2010 |
VLDB |
0.00025637562 |
| 428 |
Latent Semantic Indexing: A Probabilistic Analysis |
1998 |
PODS |
0.00023512226 |
| 513 |
TURL: Table Understanding through Representation Learning |
2021 |
VLDB |
0.00021288342 |
| 518 |
Data Integration for the Relational Web |
2009 |
VLDB |
0.00021158934 |
| 818 |
Finding Related Tables |
2012 |
SIGMOD |
0.00016311524 |
| 939 |
Data Lake Management: Challenges and Opportunities |
2019 |
VLDB |
0.00015187344 |
| 1,001 |
Recovering Semantics of Tables on the Web |
2011 |
VLDB |
0.00014706505 |
| 1,178 |
Table Union Search on Open Data |
2018 |
VLDB |
0.00013468118 |
| 1,644 |
Finding Related Tables in Data Lakes for Interactive Data Science |
2020 |
SIGMOD |
0.00011041787 |
| 1,914 |
Creating Embeddings of Heterogeneous Relational Datasets for Data Integration Tasks |
2020 |
SIGMOD |
0.00010109102 |
| 2,141 |
LSH Ensemble: Internet-Scale Domain Search |
2016 |
VLDB |
9.4542625e-05 |
| 2,517 |
Annotating Columns with Pre-trained Language Models |
2022 |
SIGMOD |
8.6092139e-05 |
| 2,633 |
Schema Extraction for Tabular Data on the Web |
2013 |
VLDB |
8.4063569e-05 |
| 2,730 |
Open Data Integration |
2018 |
VLDB |
8.2126735e-05 |
| 2,888 |
Sato: Contextual Semantic Type Detection in Tables |
2020 |
VLDB |
7.9594996e-05 |
| 3,797 |
Stitching Web Tables for Improving Matching Quality |
2017 |
VLDB |
6.7597149e-05 |
| 4,801 |
CLAMS: Bringing Quality to Data Lakes |
2016 |
SIGMOD |
5.9115269e-05 |
| 5,529 |
Data-Driven Domain Discovery for Structured Datasets |
2020 |
VLDB |
5.4566641e-05 |
| 8,787 |
QuTE: Answering Quantity Queries from Web Tables |
2021 |
SIGMOD |
4.4520613e-05 |
Semantically Similar Papers