MiCS: Near-linear Scaling for Training Gigantic Model on Public Cloud
Summary: MiCS minimizes communication by shrinking collective participant sets to exploit heterogeneous cloud bandwidth, avoid slow links, and amortize global gradient synchronization. On AWS it achieves up to 2.89x throughput vs prior systems and near-linear weak scaling (99.4% at 512 GPUs for a 100B model), improving GPU utilization over DeepSpeed in constrained public-cloud networks. (summarized by gpt-5-mini on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Zhen Zhang
- 2. Shuai Zheng
- 3. Yida Wang
- 4. Justin Chiu
- 5. George Karypis
- 6. Trishul Chilimbi
- 7. Mu Li
- 8. Xin Jin
Incoming Citations (Sorted by Pagerank)
Showing 5 of 5 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 2,902 | PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel | 2023 | VLDB | 7.93939e-05 |
| 3,995 | How Large Language Models Will Disrupt Data Management | 2023 | VLDB | 6.5513237e-05 |
| 7,152 | Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity | 2024 | VLDB | 4.8154191e-05 |
| 9,805 | MEMO: Fine-grained Tensor Management For Ultra-long Context LLM Training | 2025 | SIGMOD | 4.2805224e-05 |
| 10,626 | LobRA: Multi-tenant Fine-tuning over Heterogeneous Data | 2025 | VLDB | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 0 of 0 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|
Previous
Page 1 / 1
Next