Are Key-Foreign Key Joins Safe to Avoid when Learning High-Capacity Classifiers?
Summary: Extends KFKD-based join avoidance from linear models to high-capacity classifiers (DTs, non-linear SVMs, ANNs) via experiments. Finds robustness to avoiding KFK joins, refuting prior intuition, and raises DM-ML theory questions; code and data released. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Vraj Shah
- 2. Arun Kumar
- 3. Xiaojin Zhu
Incoming Citations (Sorted by Pagerank)
Showing 9 of 9 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 1,463 | ARDA: Automatic Relational Data Augmentation for Machine Learning | 2020 | VLDB | 0.00011869295 |
| 3,750 | Data Acquisition for Improving Machine Learning Models | 2021 | VLDB | 6.7895763e-05 |
| 3,942 | Ember: No-Code Context Enrichment via Similarity-Based Keyless Joins | 2022 | VLDB | 6.6114622e-05 |
| 4,967 | Leva: Boosting Machine Learning Performance with Relational Embedding Data Augmentation | 2022 | SIGMOD | 5.7956612e-05 |
| 5,691 | Putting Things into Context: Rich Explanations for Query Answers using Join Graphs | 2021 | SIGMOD | 5.3684557e-05 |
| 5,978 | Rotom: A Meta-Learned Data Augmentation Framework for Entity Matching, Data Cleaning, Text Classification, and Beyond | 2021 | SIGMOD | 5.2453012e-05 |
| 7,179 | Coresets over Multiple Tables for Feature-rich and Data-efficient Machine Learning | 2023 | VLDB | 4.8078895e-05 |
| 10,955 | Data Acquisition for Improving Model Confidence | 2024 | SIGMOD | 4.1945683e-05 |
| 11,054 | Enriching Relations with Additional Attributes for ER | 2024 | VLDB | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 14 of 14 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 6,775 | A Unified Transferable Model for ML-Enhanced DBMS | 2022 | CIDR | 4.9299192e-05 |
| 9,486 | Quantifying the Loss of Acyclic Join Dependencies | 2023 | PODS | 4.3341665e-05 |
| 2,302 | Nearest Neighbor Classifiers over Incomplete Information: From Certain Answers to Certain Predictions | 2021 | VLDB | 9.0668832e-05 |
| 7,179 | Coresets over Multiple Tables for Feature-rich and Data-efficient Machine Learning | 2023 | VLDB | 4.8078895e-05 |
| 5,861 | Machine Learning for Databases | 2021 | VLDB | 5.298883e-05 |
| 9,776 | Structure-Aware Machine Learning over Multi-Relational Databases | 2021 | SIGMOD | 4.2856106e-05 |
| 6,347 | A Relational Framework for Classifier Engineering | 2017 | PODS | 5.1019568e-05 |
| 12,692 | Decision Tables: Scalable Classification Exploring RDBMS Capabilities | 2000 | VLDB | 4.1945683e-05 |
| 1,167 | Learning Generalized Linear Models Over Normalized Data | 2015 | SIGMOD | 0.00013547713 |
| 903 | To Join or Not to Join? Thinking Twice about Joins before Feature Selection | 2016 | SIGMOD | 0.0001547016 |