Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism
Summary: Galvatron automatically explores a large hybrid-parallelism space (data/model/pipeline/tensor) to optimize multi‑GPU Transformer training via decision-tree decomposition and a dynamic-programming search. Outperforms prior limited-parallelism systems across varying GPU memory budgets. (summarized by gpt-5-mini on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Xupeng Miao
- 2. Yujie Wang
- 3. Youhe Jiang
- 4. Chunan Shi
- 5. Xiaonan Nie
- 6. Hailin Zhang
- 7. Bin Cui
Incoming Citations (Sorted by Pagerank)
Showing 13 of 13 citing papers.
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 5 of 5 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 411 | PyTorch Distributed: Experiences on Accelerating Data Parallel Training | 2020 | VLDB | 0.00023906921 |
| 683 | Cerebro: A Data System for Optimized Deep Learning Model Selection | 2020 | VLDB | 0.00018195476 |
| 2,677 | HET: Scaling out Huge Embedding Model Training via Cache-enabled Distributed Framework | 2022 | VLDB | 8.3268401e-05 |
| 5,052 | HET-GMP: A Graph-based System Approach to Scaling Large Embedding Model Training | 2022 | SIGMOD | 5.7337977e-05 |
| 5,333 | Heterogeneity-Aware Distributed Machine Learning Training via Partial Reduce | 2021 | SIGMOD | 5.5656575e-05 |
Previous
Page 1 / 1
Next