Leva: Boosting Machine Learning Performance with Relational Embedding Data Augmentation

Summary: Leva constructs a relational embedding by graphifying the database and learning vectors that summarize the entire data. Downstream supervision filters noisy graph signals, reducing cross-relational feature engineering and data-discovery burden, and boosting ML performance on classification/regression tasks. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID: 6344
Venue: SIGMOD
Year: 2022
Pagerank: 5.7900867e-05
Overall Rank: 4,970 | 65.46%
DOI: 10.1145/3514221.3517891

Incoming Non-self Citations Over Time

Authors

1. Zixuan Zhao
2. Raul Castro Fernandez

Incoming Citations (Sorted by Pagerank)

Showing 9 of 9 citing papers.

Rank	Citing Paper	Year	Venue	Pagerank
2,842	Semantics-aware Dataset Discovery from Data Lakes with Contextualized Column-based Representation Learning	2023	VLDB	8.0366354e-05
3,982	How Large Language Models Will Disrupt Data Management	2023	VLDB	6.5595332e-05
5,439	DiffPrep: Differentiable Data Preprocessing Pipeline Search for Learning over Tabular Data	2023	SIGMOD	5.5034427e-05
5,756	Pneuma: Leveraging LLMs for Tabular Data Representation and Retrieval in an End-to-End System	2025	SIGMOD	5.3387063e-05
7,869	Solo: Data Discovery Using Natural Language Questions Via A Self-Supervised Approach	2023	SIGMOD	4.6275089e-05
8,852	Watchog: A Light-weight Contrastive Learning based Framework for Column Annotation	2023	SIGMOD	4.4313992e-05
10,760	OmniMatch: Joinability Discovery in Data Products	2025	VLDB	4.1905499e-05
10,976	Unstructured Data Fusion for Schema and Data Extraction	2024	SIGMOD	4.1905499e-05
11,057	Enriching Relations with Additional Attributes for ER	2024	VLDB	4.1905499e-05

Outgoing Citations (Sorted by Pagerank)

Showing 10 of 10 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank	Cited Paper	Year	Venue	Pagerank
514	TURL: Table Understanding through Representation Learning	2021	VLDB	0.00021280726
740	Distributed Representations of Tuples for Entity Resolution	2018	VLDB	0.00017358024
901	To Join or Not to Join? Thinking Twice about Joins before Feature Selection	2016	SIGMOD	0.00015462938
1,462	ARDA: Automatic Relational Data Augmentation for Machine Learning	2020	VLDB	0.00011866333
1,742	Auctus: A Dataset Search Engine for Data Discovery and Augmentation	2021	VLDB	0.00010695388
1,914	Creating Embeddings of Heterogeneous Relational Datasets for Data Integration Tasks	2020	SIGMOD	0.00010111859
2,142	LSH Ensemble: Internet-Scale Domain Search	2016	VLDB	9.4461701e-05
3,827	Correlation Sketches for Approximate Join-Correlation Queries	2021	SIGMOD	6.7195959e-05
4,123	Are Key-Foreign Key Joins Safe to Avoid when Learning High-Capacity Classifiers?	2018	VLDB	6.4290005e-05
7,868	Learning Over Dirty Data Without Cleaning	2020	SIGMOD	4.6276013e-05

Semantically Similar Papers

Overall Rank	Paper	Year	Venue	Pagerank
10,269	Database Views as Explanations for Relational Deep Learning	2026	VLDB	4.1905499e-05
3,409	End-to-end Optimization of Machine Learning Prediction Queries	2022	SIGMOD	7.1240791e-05
9,325	Powering In-Database Dynamic Model Slicing for Structured Data Analytics	2024	VLDB	4.351469e-05
9,478	Adda: Towards Efficient in-Database Feature Generation via LLM-based Agents	2025	SIGMOD	4.3300131e-05
1,283	Towards Linear Algebra over Normalized Data	2017	VLDB	0.00012826013
9,885	Scalable and Usable Relational Learning With Automatic Language Bias	2021	SIGMOD	4.2580321e-05
10,488	Data Enhancement for Binary Classification of Relational Data	2025	SIGMOD	4.1905499e-05
9,775	Structure-Aware Machine Learning over Multi-Relational Databases	2021	SIGMOD	4.2815042e-05
1,462	ARDA: Automatic Relational Data Augmentation for Machine Learning	2020	VLDB	0.00011866333
1,914	Creating Embeddings of Heterogeneous Relational Datasets for Data Integration Tasks	2020	SIGMOD	0.00010111859