Data Collection and Quality Challenges for Deep Learning
Summary: Examines data collection and quality challenges in deep learning, emphasizing data as a first-class citizen and data prep costs dominating DL workflows. It surveys collection, validation/cleaning techniques, and robust/fair training to handle bias and errors, urging data-management leadership. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
Incoming Citations (Sorted by Pagerank)
Showing 4 of 4 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 5,976 | Responsible Data Integration: Next-generation Challenges | 2022 | SIGMOD | 5.245976e-05 |
| 7,400 | Missing Value Imputation for Multi-attribute Sensor Data Streams via Message Propagation | 2024 | VLDB | 4.7397846e-05 |
| 9,098 | Scapin: Scalable Graph Structure Perturbation by Augmented Influence Maximization | 2023 | SIGMOD | 4.3967784e-05 |
| 9,777 | Data Augmentation for ML-driven Data Preparation and Integration | 2021 | VLDB | 4.2856106e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 6 of 6 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 460 | SeeDB: Efficient Data-Driven Visualization Recommendations to Support Visual Analytics | 2015 | VLDB | 0.00022516069 |
| 939 | Data Lake Management: Challenges and Opportunities | 2019 | VLDB | 0.00015187344 |
| 1,420 | Data Management Challenges in Production Machine Learning | 2017 | SIGMOD | 0.00012057956 |
| 1,532 | Data Management in Machine Learning: Challenges, Techniques, and Systems | 2017 | SIGMOD | 0.00011472681 |
| 2,734 | Controlling False Discoveries During Interactive Data Exploration | 2017 | SIGMOD | 8.2078306e-05 |
| 7,243 | Data Integration and Machine Learning: A Natural Synergy | 2018 | VLDB | 4.7913666e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 5,976 | Responsible Data Integration: Next-generation Challenges | 2022 | SIGMOD | 5.245976e-05 |
| 7,013 | Qualitative Data Cleaning | 2016 | VLDB | 4.8619024e-05 |
| 4,906 | Machine Learning for Big Data | 2013 | SIGMOD | 5.8389053e-05 |
| 7,655 | Machine Learning for Cloud Data Systems: the Progress so far and the Path Forward | 2021 | VLDB | 4.6872456e-05 |
| 8,346 | Deep Learning: Systems and Responsibility | 2021 | SIGMOD | 4.5420668e-05 |
| 507 | Data Quality and Data Cleaning: An Overview | 2003 | SIGMOD | 0.00021473263 |
| 13,244 | Deep Data Integration | 2021 | SIGMOD | - |
| 1,627 | Data Cleaning: Overview and Emerging Challenges | 2016 | SIGMOD | 0.00011086905 |
| 1,420 | Data Management Challenges in Production Machine Learning | 2017 | SIGMOD | 0.00012057956 |
| 1,532 | Data Management in Machine Learning: Challenges, Techniques, and Systems | 2017 | SIGMOD | 0.00011472681 |