CROSSBOW: Scaling Deep Learning with Small Batch Sizes on Multi-GPU Servers
Summary: Crossbow enables single-server, multi-GPU DL with small batches; SMA-based synchronous model averaging preserves statistical efficiency across replicas. Auto-tunes replica count per GPU for throughput, achieving 1.3–4× speedups over TensorFlow on 8 GPUs. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Alexandros Koliousis
- 2. Pijika Watcharapichat
- 3. Matthias Weidlich
- 4. Luo Mai
- 5. Paolo Costa
- 6. Peter Pietzuch
Incoming Citations (Sorted by Pagerank)
Showing 8 of 8 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 683 | Cerebro: A Data System for Optimized Deep Learning Model Selection | 2020 | VLDB | 0.00018195476 |
| 3,327 | Pump Up the Volume: Processing Large Data on GPUs with Fast Interconnects | 2020 | SIGMOD | 7.2205738e-05 |
| 4,002 | MG-Join: A Scalable Join for Massively Parallel Multi-GPU Architectures | 2021 | SIGMOD | 6.545665e-05 |
| 4,557 | Distributed Deep Learning on Data Systems: A Comparative Analysis of Approaches | 2021 | VLDB | 6.087611e-05 |
| 7,152 | Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity | 2024 | VLDB | 4.8154191e-05 |
| 8,712 | ANN Softmax: Acceleration of Extreme Classification Training | 2022 | VLDB | 4.4626362e-05 |
| 8,735 | TensorSocket: Shared Data Loading for Deep Learning Training | 2026 | SIGMOD | 4.456315e-05 |
| 9,806 | The Image Calculator: 10x Faster Image-AI Inference by Replacing JPEG with Self-designing Storage Format | 2024 | SIGMOD | 4.2805224e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 4 of 4 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 1,044 | DimmWitted: A Study of Main-Memory Statistical Analytics | 2014 | VLDB | 0.00014475229 |
| 1,942 | Heterogeneity-aware Distributed Parameter Servers | 2017 | SIGMOD | 0.00010012691 |
| 2,440 | FlexPS: Flexible Parallelism Control in Parameter Server Architecture | 2018 | VLDB | 8.8119143e-05 |
| 4,395 | Scalable Asynchronous Gradient Descent Optimization for Out-of-Core Models | 2017 | VLDB | 6.2244283e-05 |
Previous
Page 1 / 1
Next