Database Paper Browser

Back to papers

Where Is My Training Bottleneck? Hidden Trade-Offs in Deep Learning Preprocessing Pipelines

Summary: Analyzes data preprocessing pipelines across four domains, exposing bottlenecks and throughput–storage trade-offs. Presents an open-source profiler that auto-tunes preprocessing, delivering 3x–13x throughput gains with equivalent pipelines. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
6300
Venue
SIGMOD
Year
2022
Pagerank
6.8340435e-05
Overall Rank
3,698 | 74.28%
DOI
10.1145/3514221.3517848

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 9 of 9 citing papers.

Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 3 of 3 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
1,504 Analyzing and Mitigating Data Stalls in DNN Training 2021 VLDB 0.00011642333
2,170 tf.data: A Machine Learning Data Processing Framework 2021 VLDB 9.3821603e-05
3,293 Jointly Optimizing Preprocessing and Inference for DNN-based Visual Analytics 2021 VLDB 7.2629834e-05
Previous Page 1 / 1 Next

Semantically Similar Papers