GoldMiner: Elastic Scaling of Training Data Pre-Processing Pipelines for Deep Learning
Summary: GoldMiner decouples data pre-processing from model training with stateless data workers that elastically pool cluster resources. By automatically extracting stateless pre-processing from pipelines, it scales across nodes, delivering up to 12.1x faster jobs and up to 2.5x better GPU utilization in large clusters. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Hanyu Zhao
- 2. Zhi Yang
- 3. Yu Cheng
- 4. Chao Tian
- 5. Shiru Ren
- 6. Wencong Xiao
- 7. Man Yuan
- 8. Langshi Chen
- 9. Kaibo Liu
- 10. Yang Zhang
- 11. Yong Li
- 12. Wei Lin
Incoming Citations (Sorted by Pagerank)
Showing 5 of 5 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 8,348 | FusionFlow: Accelerating Data Preprocessing for Machine Learning with CPU-GPU Cooperation | 2024 | VLDB | 4.5410024e-05 |
| 8,737 | Scheduling Data Processing Pipelines for Incremental Training on MLP-based Recommendation Models | 2025 | SIGMOD | 4.456315e-05 |
| 10,492 | Malleus: Straggler-Resilient Hybrid Parallel Training of Large-scale Models via Malleable Data and Model Parallelization | 2025 | SIGMOD | 4.1945683e-05 |
| 10,580 | GPEmu: A GPU Emulator for Faster and Cheaper Prototyping and Evaluation of Deep Learning System Research | 2025 | VLDB | 4.1945683e-05 |
| 10,770 | cedar: Optimized and Unified Machine Learning Input Data Pipelines | 2025 | VLDB | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 3 of 3 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 1,504 | Analyzing and Mitigating Data Stalls in DNN Training | 2021 | VLDB | 0.00011642333 |
| 2,170 | tf.data: A Machine Learning Data Processing Framework | 2021 | VLDB | 9.3821603e-05 |
| 3,698 | Where Is My Training Bottleneck? Hidden Trade-Offs in Deep Learning Preprocessing Pipelines | 2022 | SIGMOD | 6.8340435e-05 |
Previous
Page 1 / 1
Next