Data Integration and Machine Learning: A Natural Synergy
Summary: Tutorial on data integration–machine learning synergy, highlighting ML-driven automation and human-in-the-loop pipelines to cut costs and boost accuracy. It surveys ML-based data integration, the need for clean, integrated data for end-to-end ML, and outlines cross-domain challenges. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
Incoming Citations (Sorted by Pagerank)
Showing 10 of 10 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 2,122 | SystemDS: A Declarative Machine Learning System for the End-to-End Data Science Lifecycle | 2020 | CIDR | 9.4989076e-05 |
| 3,140 | ZeroER: Entity Resolution using Zero Labeled Examples | 2020 | SIGMOD | 7.4841763e-05 |
| 3,396 | Automatic Data Repair: Are We Ready to Deploy? | 2024 | VLDB | 7.1455126e-05 |
| 4,774 | LIMA: Fine-grained Lineage Tracing and Reuse in Machine Learning Systems | 2021 | SIGMOD | 5.9316087e-05 |
| 5,869 | Demonstration of Panda: A Weakly Supervised Entity Matching System | 2021 | VLDB | 5.2959029e-05 |
| 7,634 | ReStore - Neural Data Completion for Relational Databases | 2021 | SIGMOD | 4.6911382e-05 |
| 9,409 | Ground Truth Inference for Weakly Supervised Entity Matching | 2023 | SIGMOD | 4.3441378e-05 |
| 9,777 | Data Augmentation for ML-driven Data Preparation and Integration | 2021 | VLDB | 4.2856106e-05 |
| 11,504 | LES3: Learning-based Exact Set Similarity Search | 2021 | VLDB | 4.1945683e-05 |
| 11,629 | Leveraging Organizational Resources to Adapt Models to New Data Modalities | 2020 | VLDB | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 25 of 25 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 1,420 | Data Management Challenges in Production Machine Learning | 2017 | SIGMOD | 0.00012057956 |
| 4,906 | Machine Learning for Big Data | 2013 | SIGMOD | 5.8389053e-05 |
| 7,655 | Machine Learning for Cloud Data Systems: the Progress so far and the Path Forward | 2021 | VLDB | 4.6872456e-05 |
| 48 | Data Integration: A Theoretical Perspective | 2002 | PODS | 0.00069720859 |
| 398 | Big Data Integration | 2013 | VLDB | 0.00024372588 |
| 1,532 | Data Management in Machine Learning: Challenges, Techniques, and Systems | 2017 | SIGMOD | 0.00011472681 |
| 13,244 | Deep Data Integration | 2021 | SIGMOD | - |
| 5,976 | Responsible Data Integration: Next-generation Challenges | 2022 | SIGMOD | 5.245976e-05 |
| 9,777 | Data Augmentation for ML-driven Data Preparation and Integration | 2021 | VLDB | 4.2856106e-05 |
| 7,243 | Data Integration and Machine Learning: A Natural Synergy | 2018 | VLDB | 4.7913666e-05 |