Database Paper Browser

Back to papers

Table Union Search on Open Data

Summary: Proposes probabilistic table union search for Open Data to find unionable tables via shared attribute domains. Three models (set-domain, ontology semantic, NL-domain), a per-pair data-driven selector, and a distribution-aware union-size algorithm; open benchmark with scalable results to 1M+ attributes. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
11786
Venue
VLDB
Year
2018
Pagerank
0.00013468118
Overall Rank
1,178 | 91.81%
DOI
10.14778/3192965.3192973

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 46 of 46 citing papers.

Rank Citing Paper Year Venue Pagerank
939 Data Lake Management: Challenges and Opportunities 2019 VLDB 0.00015187344
1,187 JOSIE: Overlap Set Similarity Search for Finding Joinable Tables in Data Lakes 2019 SIGMOD 0.00013443639
1,644 Finding Related Tables in Data Lakes for Interactive Data Science 2020 SIGMOD 0.00011041787
1,751 Auctus: A Dataset Search Engine for Data Discovery and Augmentation 2021 VLDB 0.00010683295
2,730 Open Data Integration 2018 VLDB 8.2126735e-05
2,836 Semantics-aware Dataset Discovery from Data Lakes with Contextualized Column-based Representation Learning 2023 VLDB 8.0443826e-05
3,000 SANTOS: Relationship-based Semantic Table Union Search 2023 SIGMOD 7.7462128e-05
3,335 DeepJoin: Joinable Table Discovery with Pre-trained Language Models 2023 VLDB 7.2065006e-05
3,358 Organizing Data Lakes for Navigation 2020 SIGMOD 7.1784949e-05
3,824 Correlation Sketches for Approximate Join-Correlation Queries 2021 SIGMOD 6.7260705e-05
3,963 Pytheas: Pattern-based Table Discovery in CSV Files 2020 VLDB 6.5840643e-05
4,595 Juneau: Data Lake Management for Jupyter 2019 VLDB 6.060188e-05
4,859 Integrating Data Lake Tables 2023 VLDB 5.8732433e-05
5,024 Towards Distribution-aware Query Answering in Data Markets 2022 VLDB 5.7535043e-05
5,381 Selective Data Acquisition in the Wild for Model Charging 2022 VLDB 5.5399508e-05
5,529 Data-Driven Domain Discovery for Structured Datasets 2020 VLDB 5.4566641e-05
5,963 Automatic Data Acquisition for Deep Learning 2021 VLDB 5.2526794e-05
5,976 Responsible Data Integration: Next-generation Challenges 2022 SIGMOD 5.245976e-05
6,092 Observatory: Characterizing Embeddings of Relational Tables 2024 VLDB 5.2138566e-05
6,233 Mosaic: A Sample-Based Database System for Open World Query Processing 2020 CIDR 5.1451876e-05
6,270 MATE: Multi-Attribute Table Extraction 2022 VLDB 5.1337451e-05
6,438 RONIN: Data Lake Exploration 2021 VLDB 5.0620163e-05
6,449 Causal Data Integration 2023 VLDB 5.0587746e-05
6,894 TableDC: Deep Clustering for Tabular Data 2025 SIGMOD 4.8925595e-05
7,643 Cross Modal Data Discovery over Structured and Unstructured Data Lakes 2023 VLDB 4.6901105e-05
8,116 LakeBench: A Benchmark for Discovering Joinable and Unionable Tables in Data Lakes 2024 VLDB 4.581507e-05
8,193 WarpGate: A Semantic Join Discovery System for Cloud Data Warehouses 2023 CIDR 4.5618596e-05
8,503 A Demonstration of KGLac: A Data Discovery and Enrichment Platform for Data Science 2021 VLDB 4.496339e-05
8,618 Nexus: Correlation Discovery over Collections of Spatio-Temporal Tabular Data 2024 SIGMOD 4.4838259e-05
8,729 OneProvenance: Efficient Extraction of Dynamic Coarse-Grained Provenance From Database Query Event Logs 2023 VLDB 4.4582221e-05
8,910 R2D2: Reducing Redundancy and Duplication in Data Lakes 2023 SIGMOD 4.427232e-05
9,371 Auto-Formula: Recommend Formulas in Spreadsheets using Contrastive Learning for Table Representations 2024 SIGMOD 4.3480692e-05
9,961 QueryArtisan: Generating Data Manipulation Codes for Ad-hoc Analysis in Data Lakes 2025 VLDB 4.2294678e-05
10,109 Retrieve-and-Verify: A Table Context Selection Framework for Accurate Column Annotations 2026 SIGMOD 4.1945683e-05
10,329 Revisiting Task-Oriented Dataset Search in the Era of Large Language Models: Challenges, Benchmark, and Solution 2026 VLDB 4.1945683e-05
10,510 Table Overlap Estimation through Graph Embeddings 2025 SIGMOD 4.1945683e-05
10,589 Birdie: Natural Language-Driven Table Discovery Using Differentiable Search Index 2025 VLDB 4.1945683e-05
10,595 Optimized Batch Prompting for Cost-effective LLMs 2025 VLDB 4.1945683e-05
10,645 OpenForge: Probabilistic Metadata Integration 2025 VLDB 4.1945683e-05
10,685 LakeVisage: Towards Scalable, Flexible and Interactive Visualization Recommendation for Data Discovery over Data Lakes 2025 VLDB 4.1945683e-05
10,754 OmniMatch: Joinability Discovery in Data Products 2025 VLDB 4.1945683e-05
10,836 Data Discovery in Data Lakes: Operations, Indexes, Systems 2025 VLDB 4.1945683e-05
10,951 Determining the Largest Overlap between Tables 2024 SIGMOD 4.1945683e-05
11,054 Enriching Relations with Additional Attributes for ER 2024 VLDB 4.1945683e-05
11,097 Navigating Data Repositories: Utilizing Line Charts to Discover Relevant Datasets 2024 VLDB 4.1945683e-05
11,528 Valentine in Action: Matching Tabular Data at Scale 2021 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 13 of 13 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers