FusionFlow: Accelerating Data Preprocessing for Machine Learning with CPU-GPU Cooperation
Summary: FusionFlow runs data-augmentation on CPUs and GPUs with GPU-aware allocations inside GPU free space and adaptive scheduling to avoid interfering with model training. Yields 16–285% single-machine throughput gains and cuts CPU needs ~50–60% versus CPU-only or remote preprocessing. (summarized by gpt-5-mini on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Taeyoon Kim
- 2. ChanHo Park
- 3. Mansur Mukimbekov
- 4. Heelim Hong
- 5. Minseok Kim
- 6. Ze Jin
- 7. Changdae Kim
- 8. Ji-Yong Shin
- 9. Myeongjae Jeon
Incoming Citations (Sorted by Pagerank)
Showing 2 of 2 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 8,735 | TensorSocket: Shared Data Loading for Deep Learning Training | 2026 | SIGMOD | 4.456315e-05 |
| 10,770 | cedar: Optimized and Unified Machine Learning Input Data Pipelines | 2025 | VLDB | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 5 of 5 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 1,504 | Analyzing and Mitigating Data Stalls in DNN Training | 2021 | VLDB | 0.00011642333 |
| 2,170 | tf.data: A Machine Learning Data Processing Framework | 2021 | VLDB | 9.3821603e-05 |
| 4,180 | FastFlow: Accelerating Deep Learning Model Training with Smart Offloading of Input Data Pipeline | 2023 | VLDB | 6.3793352e-05 |
| 5,333 | Heterogeneity-Aware Distributed Machine Learning Training via Partial Reduce | 2021 | SIGMOD | 5.5656575e-05 |
| 5,552 | GoldMiner: Elastic Scaling of Training Data Pre-Processing Pipelines for Deep Learning | 2023 | SIGMOD | 5.4402488e-05 |
Previous
Page 1 / 1
Next