FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement

Summary: Dynamic expert management and placement address routing imbalance and dataflow fluctuation in sparse MoE training. A scheduling module monitors data flow and remaps hardware on the fly with a lightweight heuristic, boosting performance vs baselines. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID: 6614
Venue: SIGMOD
Year: 2023
Pagerank: 4.4413307e-05
Overall Rank: 8,807 | 38.80%
DOI: 10.1145/3588964

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 5 of 5 citing papers.

Rank	Citing Paper	Year	Venue	Pagerank
7,143	Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity	2024	VLDB	4.8143774e-05
7,535	Angel-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent	2023	VLDB	4.7131061e-05
8,117	SDPipe: A Semi-Decentralized Framework for Heterogeneity-aware Pipeline-parallel Training	2023	VLDB	4.5788485e-05
9,787	The Image Calculator: 10x Faster Image-AI Inference by Replacing JPEG with Self-designing Storage Format	2024	SIGMOD	4.2799988e-05
10,502	Malleus: Straggler-Resilient Hybrid Parallel Training of Large-scale Models via Malleable Data and Model Parallelization	2025	SIGMOD	4.1905499e-05

Outgoing Citations (Sorted by Pagerank)

Showing 7 of 7 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank	Cited Paper	Year	Venue	Pagerank
411	PyTorch Distributed: Experiences on Accelerating Data Parallel Training	2020	VLDB	0.00023881138
2,678	HET: Scaling out Huge Embedding Model Training via Cache-enabled Distributed Framework	2022	VLDB	8.3224016e-05
4,924	An Experimental Evaluation of Large Scale GBDT Systems	2019	VLDB	5.8211961e-05
5,169	HET-GMP: A Graph-based System Approach to Scaling Large Embedding Model Training	2022	SIGMOD	5.642415e-05
6,361	Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism	2023	VLDB	5.0903244e-05
7,535	Angel-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent	2023	VLDB	4.7131061e-05
9,965	Towards Communication-efficient Vertical Federated Learning Training via Cache-enabled Local Updates	2022	VLDB	4.2229209e-05

Semantically Similar Papers

Overall Rank	Paper	Year	Venue	Pagerank
5,560	GoldMiner: Elastic Scaling of Training Data Pre-Processing Pipelines for Deep Learning	2023	SIGMOD	5.4350242e-05
3,694	Where Is My Training Bottleneck? Hidden Trade-Offs in Deep Learning Preprocessing Pipelines	2022	SIGMOD	6.8316905e-05
9,786	MEMO: Fine-grained Tensor Management For Ultra-long Context LLM Training	2025	SIGMOD	4.2799988e-05
4,807	Resource Elasticity for Large-Scale Machine Learning	2015	SIGMOD	5.9045148e-05
2,907	PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel	2023	VLDB	7.9322286e-05
7,143	Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity	2024	VLDB	4.8143774e-05
11,486	Pool of Experts: Realtime Querying Specialized Knowledge in Massive Neural Networks	2021	SIGMOD	4.1905499e-05
9,174	MemFlow: Memory-Aware Distributed Deep Learning	2020	SIGMOD	4.3807157e-05
4,175	FastFlow: Accelerating Deep Learning Model Training with Smart Offloading of Input Data Pipeline	2023	VLDB	6.3772575e-05
10,297	Resilience-Aware Elastic Scaling for Cloud-Native Online DL Training on Multi-Tenant GPU Clusters	2026	VLDB	4.1905499e-05