Database Paper Browser

Back to papers

Analyzing and Mitigating Data Stalls in DNN Training

Summary: Data-pipeline stalls often dominate DNN training time for CV and audio models; large-scale study covers nine models, four datasets, three tasks. DS-Analyzer quantifies stalls; CoorDL's three loading techniques yield up to 5x speedups vs DALI. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
12593
Venue
VLDB
Year
2021
Pagerank
0.00011642333
Overall Rank
1,504 | 89.54%
DOI
10.14778/3446095.3446100

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 16 of 16 citing papers.

Rank Citing Paper Year Venue Pagerank
2,170 tf.data: A Machine Learning Data Processing Framework 2021 VLDB 9.3821603e-05
2,688 Accelerating Recommendation System Training by Leveraging Popular Choices 2022 VLDB 8.2991144e-05
3,698 Where Is My Training Bottleneck? Hidden Trade-Offs in Deep Learning Preprocessing Pipelines 2022 SIGMOD 6.8340435e-05
4,180 FastFlow: Accelerating Deep Learning Model Training with Smart Offloading of Input Data Pipeline 2023 VLDB 6.3793352e-05
5,552 GoldMiner: Elastic Scaling of Training Data Pre-Processing Pipelines for Deep Learning 2023 SIGMOD 5.4402488e-05
6,057 Progressive Compressed Records: Taking a Byte out of Deep Learning Data 2021 VLDB 5.2317752e-05
7,656 Nautilus: An Optimized System for Deep Transfer Learning over Evolving Training Datasets 2022 SIGMOD 4.6871575e-05
8,348 FusionFlow: Accelerating Data Preprocessing for Machine Learning with CPU-GPU Cooperation 2024 VLDB 4.5410024e-05
8,735 TensorSocket: Shared Data Loading for Deep Learning Training 2026 SIGMOD 4.456315e-05
8,737 Scheduling Data Processing Pipelines for Incremental Training on MLP-based Recommendation Models 2025 SIGMOD 4.456315e-05
9,677 Apt-Serve: Adaptive Request Scheduling on Hybrid Cache for Scalable LLM Inference Serving 2025 SIGMOD 4.3047774e-05
9,805 MEMO: Fine-grained Tensor Management For Ultra-long Context LLM Training 2025 SIGMOD 4.2805224e-05
10,183 Mixtera: A Data Plane for Foundation Model Training 2026 SIGMOD 4.1945683e-05
10,580 GPEmu: A GPU Emulator for Faster and Cheaper Prototyping and Evaluation of Deep Learning System Research 2025 VLDB 4.1945683e-05
10,770 cedar: Optimized and Unified Machine Learning Input Data Pipelines 2025 VLDB 4.1945683e-05
10,856 Analyzing Near-Network Hardware Acceleration with Co-Processing on DPUs 2025 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 1 of 1 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
683 Cerebro: A Data System for Optimized Deep Learning Model Selection 2020 VLDB 0.00018195476
Previous Page 1 / 1 Next

Semantically Similar Papers