BAGUA: Scaling up Distributed Learning with System Relaxations
Summary: MPI-style, modular BAGUA provides system-relaxation primitives (quantization, decentralization, delayed communication) for distributed data-parallel training. Enables rapid prototyping of advanced distributed-learning algorithms; delivers up to 2x end-to-end speedups over PyTorch-DDP/Horovod/BytePS on 128 GPUs and analyzes performance tradeoffs across network conditions. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Shaoduo Gan
- 2. Jiawei Jiang
- 3. Binhang Yuan
- 4. Ce Zhang
- 5. Xiangru Lian
- 6. Rui Wang
- 7. Jianbin Chang
- 8. Chengjun Liu
- 9. Hongmei Shi
- 10. Shengzhuo Zhang
- 11. Xianghong Li
- 12. Tengxu Sun
- 13. Sen Yang
- 14. Ji Liu
Incoming Citations (Sorted by Pagerank)
Showing 4 of 4 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 7,306 | DAPHNE: An Open and Extensible System Infrastructure for Integrated Data Analysis Pipelines | 2022 | CIDR | 4.7678574e-05 |
| 10,398 | Sequoia: An Accessible and Extensible Framework for Privacy-Preserving Machine Learning over Distributed Data | 2025 | SIGMOD | 4.1945683e-05 |
| 10,492 | Malleus: Straggler-Resilient Hybrid Parallel Training of Large-scale Models via Malleable Data and Model Parallelization | 2025 | SIGMOD | 4.1945683e-05 |
| 10,626 | LobRA: Multi-tenant Fine-tuning over Heterogeneous Data | 2025 | VLDB | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 12 of 12 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
Previous
Page 1 / 1
Next