FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement
Summary: Dynamic expert management and placement address routing imbalance and dataflow fluctuation in sparse MoE training. A scheduling module monitors data flow and remaps hardware on the fly with a lightweight heuristic, boosting performance vs baselines. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Xiaonan Nie
- 2. Xupeng Miao
- 3. Zilong Wang
- 4. Zichao Yang
- 5. Jilong Xue
- 6. Lingxiao Ma
- 7. Gang Cao
- 8. Bin Cui
Incoming Citations (Sorted by Pagerank)
Showing 5 of 5 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 7,152 | Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity | 2024 | VLDB | 4.8154191e-05 |
| 7,536 | Angel-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent | 2023 | VLDB | 4.7176331e-05 |
| 8,126 | SDPipe: A Semi-Decentralized Framework for Heterogeneity-aware Pipeline-parallel Training | 2023 | VLDB | 4.5796615e-05 |
| 9,806 | The Image Calculator: 10x Faster Image-AI Inference by Replacing JPEG with Self-designing Storage Format | 2024 | SIGMOD | 4.2805224e-05 |
| 10,492 | Malleus: Straggler-Resilient Hybrid Parallel Training of Large-scale Models via Malleable Data and Model Parallelization | 2025 | SIGMOD | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 7 of 7 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 411 | PyTorch Distributed: Experiences on Accelerating Data Parallel Training | 2020 | VLDB | 0.00023906921 |
| 2,677 | HET: Scaling out Huge Embedding Model Training via Cache-enabled Distributed Framework | 2022 | VLDB | 8.3268401e-05 |
| 4,975 | An Experimental Evaluation of Large Scale GBDT Systems | 2019 | VLDB | 5.79026e-05 |
| 5,052 | HET-GMP: A Graph-based System Approach to Scaling Large Embedding Model Training | 2022 | SIGMOD | 5.7337977e-05 |
| 6,377 | Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism | 2023 | VLDB | 5.0911095e-05 |
| 7,536 | Angel-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent | 2023 | VLDB | 4.7176331e-05 |
| 9,966 | Towards Communication-efficient Vertical Federated Learning Training via Cache-enabled Local Updates | 2022 | VLDB | 4.2269436e-05 |
Previous
Page 1 / 1
Next