Database Paper Browser

Back to papers

Creating Embeddings of Heterogeneous Relational Datasets for Data Integration Tasks

Summary: Proposes EmbDI, local relational embeddings learned from a compact graph-based representation for data integration. Derives sentences from the graph to express similarity across tokens, attributes, and rows, aiding schema matching and entity resolution. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
5957
Venue
SIGMOD
Year
2020
Pagerank
0.00010109102
Overall Rank
1,914 | 86.69%
DOI
10.1145/3318464.3389742

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 32 of 32 citing papers.

Rank Citing Paper Year Venue Pagerank
2,349 RPT: Relational Pre-trained Transformer Is Almost All You Need towards Democratizing Data Preparation 2021 VLDB 8.9876423e-05
2,364 Deep Learning Models for Selectivity Estimation of Multi-Attribute Queries 2020 SIGMOD 8.9554751e-05
2,517 Annotating Columns with Pre-trained Language Models 2022 SIGMOD 8.6092139e-05
2,836 Semantics-aware Dataset Discovery from Data Lakes with Contextualized Column-based Representation Learning 2023 VLDB 8.0443826e-05
3,000 SANTOS: Relationship-based Semantic Table Union Search 2023 SIGMOD 7.7462128e-05
3,640 Deep Learning for Blocking in Entity Matching: A Design Space Exploration 2021 VLDB 6.8891671e-05
4,359 Astrid: Accurate Selectivity Estimation for String Predicates using Deep Learning 2021 VLDB 6.2569955e-05
4,462 LOGER: A Learned Optimizer towards Generating Efficient and Robust Query Execution Plans 2023 VLDB 6.1611784e-05
4,859 Integrating Data Lake Tables 2023 VLDB 5.8732433e-05
4,967 Leva: Boosting Machine Learning Performance with Relational Embedding Data Augmentation 2022 SIGMOD 5.7956612e-05
5,429 DiffPrep: Differentiable Data Preprocessing Pipeline Search for Learning over Tabular Data 2023 SIGMOD 5.5087325e-05
5,449 Transformers for Tabular Data Representation: A Tutorial on Models and Applications 2022 VLDB 5.5008652e-05
5,941 Big Graphs: Challenges and Opportunities 2022 VLDB 5.2635446e-05
6,092 Observatory: Characterizing Embeddings of Relational Tables 2024 VLDB 5.2138566e-05
6,894 TableDC: Deep Clustering for Tabular Data 2025 SIGMOD 4.8925595e-05
7,048 Magneto: Combining Small and Large Language Models for Schema Matching 2025 VLDB 4.8520651e-05
8,958 FlexER: Flexible Entity Resolution for Multiple Intents 2023 SIGMOD 4.4210635e-05
9,235 ThriftLLM: On Cost-Effective Selection of Large Language Models for Classification Queries 2025 VLDB 4.3690661e-05
9,262 SubTab: Data Exploration with Informative Sub-Tables 2022 SIGMOD 4.368964e-05
9,394 BigVectorBench: Heterogeneous Data Embedding and Compound Queries are Essential in Evaluating Vector Databases 2025 VLDB 4.3441378e-05
9,683 Hierarchical Entity Resolution using an Oracle 2022 SIGMOD 4.3047774e-05
9,777 Data Augmentation for ML-driven Data Preparation and Integration 2021 VLDB 4.2856106e-05
10,268 OpenSQL: Data-Efficient Text-to-SQL for Open-Source LLMs via Synthesized Intermediate Supervision 2026 VLDB 4.1945683e-05
10,269 Database Views as Explanations for Relational Deep Learning 2026 VLDB 4.1945683e-05
10,510 Table Overlap Estimation through Graph Embeddings 2025 SIGMOD 4.1945683e-05
10,754 OmniMatch: Joinability Discovery in Data Products 2025 VLDB 4.1945683e-05
10,951 Determining the Largest Overlap between Tables 2024 SIGMOD 4.1945683e-05
11,047 Blocker and Matcher Can Mutually Benefit: A Co-Learning Framework for Low-Resource Entity Resolution 2024 VLDB 4.1945683e-05
11,216 Demystifying the QoS and QoE of Edge-hosted Video Streaming Applications in the Wild with SNESet 2023 SIGMOD 4.1945683e-05
11,230 VersaMatch: Ontology Matching with Weak Supervision 2023 VLDB 4.1945683e-05
11,266 MINT: Detecting Fraudulent Behaviors from Time-series Relational Data 2023 VLDB 4.1945683e-05
11,343 SPINE: Scaling up Programming-by-Negative-Example for String Filtering and Transformation 2022 SIGMOD 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 13 of 13 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers