Database Paper Browser

Back to papers

ARDA: Automatic Relational Data Augmentation for Machine Learning

Summary: ARDA automates relational data augmentation for ML via joins of input data with external data to yield features. Two components: data-search-join to assemble augmented features; efficient feature selection to prune noisy features; validated on datasets. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
12049
Venue
VLDB
Year
2020
Pagerank
0.00011869295
Overall Rank
1,463 | 89.83%
DOI
10.14778/3397230.3397235

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 32 of 32 citing papers.

Rank Citing Paper Year Venue Pagerank
1,751 Auctus: A Dataset Search Engine for Data Discovery and Augmentation 2021 VLDB 0.00010683295
3,335 DeepJoin: Joinable Table Discovery with Pre-trained Language Models 2023 VLDB 7.2065006e-05
3,473 AI Meets Database: AI4DB and DB4AI 2021 SIGMOD 7.062864e-05
3,727 Cost-based or Learning-based? A Hybrid Query Optimizer for Query Plan Selection 2022 VLDB 6.8141709e-05
3,750 Data Acquisition for Improving Machine Learning Models 2021 VLDB 6.7895763e-05
3,824 Correlation Sketches for Approximate Join-Correlation Queries 2021 SIGMOD 6.7260705e-05
3,942 Ember: No-Code Context Enrichment via Similarity-Based Keyless Joins 2022 VLDB 6.6114622e-05
4,967 Leva: Boosting Machine Learning Performance with Relational Embedding Data Augmentation 2022 SIGMOD 5.7956612e-05
5,381 Selective Data Acquisition in the Wild for Model Charging 2022 VLDB 5.5399508e-05
5,691 Putting Things into Context: Rich Explanations for Query Answers using Join Graphs 2021 SIGMOD 5.3684557e-05
5,978 Rotom: A Meta-Learned Data Augmentation Framework for Entity Matching, Data Cleaning, Text Classification, and Beyond 2021 SIGMOD 5.2453012e-05
6,077 The Fast and the Private: Task-based Dataset Search 2024 CIDR 5.2229324e-05
6,228 Managing ML Pipelines: Feature Stores and the Coming Wave of Embedding Ecosystems 2021 VLDB 5.1470042e-05
6,270 MATE: Multi-Attribute Table Extraction 2022 VLDB 5.1337451e-05
6,467 Tailoring Data Source Distributions for Fairness-aware Data Integration 2021 VLDB 5.0528156e-05
7,179 Coresets over Multiple Tables for Feature-rich and Data-efficient Machine Learning 2023 VLDB 4.8078895e-05
7,491 Saibot: A Differentially Private Data Search Platform 2023 VLDB 4.7180617e-05
7,582 LakeCompass: An End-to-End System for Data Maintenance, Search and Analysis in Data Lakes 2024 VLDB 4.7046388e-05
8,116 LakeBench: A Benchmark for Discovering Joinable and Unionable Tables in Data Lakes 2024 VLDB 4.581507e-05
8,281 Optimizing Data Acquisition to Enhance Machine Learning Performance 2024 VLDB 4.5435639e-05
9,438 Bootleg: Chasing the Tail with Self-Supervised Named Entity Disambiguation 2021 CIDR 4.3425082e-05
9,849 Reptile: Aggregation-level Explanations for Hierarchical Data 2022 SIGMOD 4.2721228e-05
10,269 Database Views as Explanations for Relational Deep Learning 2026 VLDB 4.1945683e-05
10,478 Data Enhancement for Binary Classification of Relational Data 2025 SIGMOD 4.1945683e-05
10,725 Suna: Scalable Causal Confounder Discovery over Relational Data 2025 VLDB 4.1945683e-05
10,754 OmniMatch: Joinability Discovery in Data Products 2025 VLDB 4.1945683e-05
10,955 Data Acquisition for Improving Model Confidence 2024 SIGMOD 4.1945683e-05
10,973 Unstructured Data Fusion for Schema and Data Extraction 2024 SIGMOD 4.1945683e-05
11,025 Sampling Methods for Inner Product Sketching 2024 VLDB 4.1945683e-05
11,035 Relational Query Synthesis ⋈ Decision Tree Learning 2024 VLDB 4.1945683e-05
11,054 Enriching Relations with Additional Attributes for ER 2024 VLDB 4.1945683e-05
11,220 Lightweight Materialization for Fast Dashboards Over Joins 2023 SIGMOD 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 12 of 12 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers