Harmony: Overcoming the Hurdles of GPU Memory Capacity to Train Massive DNN Models on Commodity Servers
Summary: Harmony rethinks GPU memory management and data movement to train massive DNNs on a single commodity server. Redesigned scheduling and CPU–GPU data paths cut swap by up to 100x and yield up to 7.6x throughput over optimized virtual memory baselines. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Youjie Li
- 2. Amar Phanishayee
- 3. Derek Murray
- 4. Jakub Tarnawski
- 5. Nam Sung Kim
Incoming Citations (Sorted by Pagerank)
Showing 2 of 2 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 7,152 | Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity | 2024 | VLDB | 4.8154191e-05 |
| 9,326 | BladeDISC: Optimizing Dynamic Shape Machine Learning Workloads via Compiler Approach | 2023 | SIGMOD | 4.3556432e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 1 of 1 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 411 | PyTorch Distributed: Experiences on Accelerating Data Parallel Training | 2020 | VLDB | 0.00023906921 |
Previous
Page 1 / 1
Next