How Can We Train Deep Learning Models Across Clouds and Continents? An Experimental Study
Summary: Empirically evaluates cost and throughput of training representative CV/NLP/ASR models on spot VMs across zones, continents, and cloud providers, quantifying geographic and provider trade-offs. Shows hybrid-cloud and many-cheap-VM spot strategies can outperform centralized or on‑demand instances in cost and throughput. (summarized by gpt-5-mini on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Alexander Erben
- 2. Ruben Mayer
- 3. Hans-Arno Jacobsen
Incoming Citations (Sorted by Pagerank)
Showing 1 of 1 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 10,492 | Malleus: Straggler-Resilient Hybrid Parallel Training of Large-scale Models via Malleable Data and Model Parallelization | 2025 | SIGMOD | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 1 of 1 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 411 | PyTorch Distributed: Experiences on Accelerating Data Parallel Training | 2020 | VLDB | 0.00023906921 |
Previous
Page 1 / 1
Next