Database Paper Browser

Back to papers

Data Collection and Quality Challenges for Deep Learning

Summary: Examines data collection and quality challenges in deep learning, emphasizing data as a first-class citizen and data prep costs dominating DL workflows. It surveys collection, validation/cleaning techniques, and robust/fair training to handle bias and errors, urging data-management leadership. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
12221
Venue
VLDB
Year
2020
Pagerank
5.0267429e-05
Overall Rank
6,526 | 54.61%
DOI
10.14778/3415478.3415562

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 4 of 4 citing papers.

Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 6 of 6 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers

Overall Rank Paper Year Venue Pagerank
5,976 Responsible Data Integration: Next-generation Challenges 2022 SIGMOD 5.245976e-05
7,013 Qualitative Data Cleaning 2016 VLDB 4.8619024e-05
4,906 Machine Learning for Big Data 2013 SIGMOD 5.8389053e-05
7,655 Machine Learning for Cloud Data Systems: the Progress so far and the Path Forward 2021 VLDB 4.6872456e-05
8,346 Deep Learning: Systems and Responsibility 2021 SIGMOD 4.5420668e-05
507 Data Quality and Data Cleaning: An Overview 2003 SIGMOD 0.00021473263
13,244 Deep Data Integration 2021 SIGMOD -
1,627 Data Cleaning: Overview and Emerging Challenges 2016 SIGMOD 0.00011086905
1,420 Data Management Challenges in Production Machine Learning 2017 SIGMOD 0.00012057956
1,532 Data Management in Machine Learning: Challenges, Techniques, and Systems 2017 SIGMOD 0.00011472681