Database Paper Browser

Back to papers

Qualitative Join Discovery in Data Lakes using Examples

Summary: QbE join discovery for schema/metadata-poor data lakes: SemDisc finds hybrid join paths mixing equi- and semantic joins from a few example values. Key novelty is supporting intermediate non-overlapping tables while guaranteeing example containment, with index-backed path search and >3x precision gains. (summarized by gpt-5-mini on Apr 11 2026)

Paper ID
7508
Venue
SIGMOD
Year
2026
Pagerank
4.1945683e-05
Overall Rank
10,197 | 29.07%
DOI
10.1145/3786682

Incoming Non-self Citations Over Time

No non-self incoming citations found for this paper in this database.

Authors

Incoming Citations (Sorted by Pagerank)

Showing 0 of 0 citing papers.

Rank Citing Paper Year Venue Pagerank
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 38 of 38 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
204 Learned Cardinalities: Estimating Correlated Joins with Deep Learning 2019 CIDR 0.00034784455
211 Join Synopses for Approximate Query Answering 1999 SIGMOD 0.00033981214
221 Deep Entity Matching with Pre-Trained Language Models 2021 VLDB 0.00033121824
513 TURL: Table Understanding through Representation Learning 2021 VLDB 0.00021288342
608 DeepDB: Learn from Data, not from Queries! 2020 VLDB 0.00019235898
806 An End-to-End Learning-based Cost Estimator 2020 VLDB 0.00016434274
910 NeuroCard: One Cardinality Estimator for All Tables 2021 VLDB 0.00015423056
939 Data Lake Management: Challenges and Opportunities 2019 VLDB 0.00015187344
1,105 Cardinality Estimation Done Right: Index-Based Join Sampling 2017 CIDR 0.00013990395
1,187 JOSIE: Overlap Set Similarity Search for Finding Joinable Tables in Data Lakes 2019 SIGMOD 0.00013443639
1,459 Query From Examples: An Iterative, Data-Driven Approach to Query Construction 2015 VLDB 0.00011889802
1,509 Discovering Queries based on Example Tuples 2014 SIGMOD 0.00011612727
1,703 Are We Ready For Learned Cardinality Estimation? 2021 VLDB 0.00010836769
1,751 Auctus: A Dataset Search Engine for Data Discovery and Augmentation 2021 VLDB 0.00010683295
2,141 LSH Ensemble: Internet-Scale Domain Search 2016 VLDB 9.4542625e-05
2,576 S4: Top-k Spreadsheet-Style Search for Query Discovery 2015 SIGMOD 8.5112408e-05
2,836 Semantics-aware Dataset Discovery from Data Lakes with Contextualized Column-based Representation Learning 2023 VLDB 8.0443826e-05
3,000 SANTOS: Relationship-based Semantic Table Union Search 2023 SIGMOD 7.7462128e-05
3,266 Learned Cardinality Estimation: An In-depth Study 2022 SIGMOD 7.3074684e-05
3,335 DeepJoin: Joinable Table Discovery with Pre-trained Language Models 2023 VLDB 7.2065006e-05
3,358 Organizing Data Lakes for Navigation 2020 SIGMOD 7.1784949e-05
3,661 Example-Driven Query Intent Discovery: Abductive Reasoning using Semantic Similarity 2019 VLDB 6.8689912e-05
3,924 A Unified Deep Model of Learning from both Data and Queries for Cardinality Estimation 2021 SIGMOD 6.6271553e-05
4,859 Integrating Data Lake Tables 2023 VLDB 5.8732433e-05
4,985 Pivot-based Metric Indexing 2017 VLDB 5.7856648e-05
5,099 ArcheType: A Novel Framework for Open-Source Column Type Annotation using Large Language Models 2024 VLDB 5.6997784e-05
5,928 SchemaPile: A Large Collection of Relational Database Schemas 2024 SIGMOD 5.2685946e-05
6,270 MATE: Multi-Attribute Table Extraction 2022 VLDB 5.1337451e-05
6,438 RONIN: Data Lake Exploration 2021 VLDB 5.0620163e-05
7,123 ASM: Harmonizing Autoregressive Model, Sampling, and Multi-dimensional Statistics Merging for Cardinality Estimation 2024 SIGMOD 4.8251036e-05
7,303 DICE: Data Discovery by Example 2021 VLDB 4.7684686e-05
7,868 Solo: Data Discovery Using Natural Language Questions Via A Self-Supervised Approach 2023 SIGMOD 4.6319504e-05
8,116 LakeBench: A Benchmark for Discovering Joinable and Unionable Tables in Data Lakes 2024 VLDB 4.581507e-05
8,193 WarpGate: A Semantic Join Discovery System for Cloud Data Warehouses 2023 CIDR 4.5618596e-05
8,503 A Demonstration of KGLac: A Data Discovery and Enrichment Platform for Data Science 2021 VLDB 4.496339e-05
8,579 RECA: Related Tables Enhanced Column Semantic Type Annotation Framework 2023 VLDB 4.4922446e-05
8,618 Nexus: Correlation Discovery over Collections of Spatio-Temporal Tabular Data 2024 SIGMOD 4.4838259e-05
9,928 Fainder: A Fast and Accurate Index for Distribution-Aware Dataset Search 2024 VLDB 4.2511622e-05
Previous Page 1 / 1 Next

Semantically Similar Papers