Database Paper Browser

Back to papers

OmniMatch: Joinability Discovery in Data Products

Summary: OmniMatch: joinability discovery for curated data products that fuses multiple column-pair similarity measures with a self-supervised GNN exploiting graph neighborhood to boost recall. Automated negative-pair generation raises precision, yielding up to 14% F1/AUC gains without per-metric thresholds. (summarized by gpt-5-mini on Feb 09 2026)

Paper ID
14069
Venue
VLDB
Year
2025
Pagerank
4.1945683e-05
Overall Rank
10,754 | 25.19%
DOI
10.14778/3749646.3749715

Incoming Non-self Citations Over Time

No non-self incoming citations found for this paper in this database.

Authors

Incoming Citations (Sorted by Pagerank)

Showing 0 of 0 citing papers.

Rank Citing Paper Year Venue Pagerank
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 29 of 29 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
303 Generic Schema Matching with Cupid 2001 VLDB 0.00028301477
382 COMA - A system for flexible combination of schema matching approaches 2002 VLDB 0.00024823252
420 InfoGather: Entity Augmentation and Attribute Discovery By Holistic Matching with Web Tables 2012 SIGMOD 0.00023719065
513 TURL: Table Understanding through Representation Learning 2021 VLDB 0.00021288342
984 Natural language to SQL: Where are we today? 2020 VLDB 0.00014857465
1,178 Table Union Search on Open Data 2018 VLDB 0.00013468118
1,187 JOSIE: Overlap Set Similarity Search for Finding Joinable Tables in Data Lakes 2019 SIGMOD 0.00013443639
1,463 ARDA: Automatic Relational Data Augmentation for Machine Learning 2020 VLDB 0.00011869295
1,644 Finding Related Tables in Data Lakes for Interactive Data Science 2020 SIGMOD 0.00011041787
1,664 On Multi-Column Foreign Key Discovery 2010 VLDB 0.00010976887
1,914 Creating Embeddings of Heterogeneous Relational Datasets for Data Integration Tasks 2020 SIGMOD 0.00010109102
2,836 Semantics-aware Dataset Discovery from Data Lakes with Contextualized Column-based Representation Learning 2023 VLDB 8.0443826e-05
3,000 SANTOS: Relationship-based Semantic Table Union Search 2023 SIGMOD 7.7462128e-05
3,335 DeepJoin: Joinable Table Discovery with Pre-trained Language Models 2023 VLDB 7.2065006e-05
3,823 Automatic Discovery of Attributes in Relational Databases 2011 SIGMOD 6.7261168e-05
4,703 Medical Entity Disambiguation Using Graph Neural Networks 2021 SIGMOD 5.9855056e-05
4,859 Integrating Data Lake Tables 2023 VLDB 5.8732433e-05
4,967 Leva: Boosting Machine Learning Performance with Relational Embedding Data Augmentation 2022 SIGMOD 5.7956612e-05
5,179 SilkMoth: An Efficient Method for Finding Related Sets with Maximum Matching Constraints 2017 VLDB 5.6428428e-05
5,434 Auto-FuzzyJoin: Auto-Program Fuzzy Similarity Joins Without Labeled Examples 2021 SIGMOD 5.5045402e-05
5,449 Transformers for Tabular Data Representation: A Tutorial on Models and Applications 2022 VLDB 5.5008652e-05
5,794 Discovering Related Data At Scale 2021 VLDB 5.3245122e-05
7,006 Synthesizing Products for Online Catalogs 2011 VLDB 4.8653916e-05
7,048 Magneto: Combining Small and Large Language Models for Schema Matching 2025 VLDB 4.8520651e-05
7,613 ADnEV: Cross-Domain Schema Matching using Deep Similarity Matrix Adjustment and Evaluation 2020 VLDB 4.6961059e-05
8,137 Customizable and Scalable Fuzzy Join for Big Data 2019 VLDB 4.5774794e-05
8,193 WarpGate: A Semantic Join Discovery System for Cloud Data Warehouses 2023 CIDR 4.5618596e-05
8,503 A Demonstration of KGLac: A Data Discovery and Enrichment Platform for Data Science 2021 VLDB 4.496339e-05
8,958 FlexER: Flexible Entity Resolution for Multiple Intents 2023 SIGMOD 4.4210635e-05
Previous Page 1 / 1 Next

Semantically Similar Papers