DECK: Experiences on Delta Checkpointing for Industrial Recommendation Systems
Summary: DECK introduces a production-ready delta checkpointing system for multi‑terabyte industrial recommender training that extracts and streams model-state deltas with near-zero overhead and without halting training. Decoupled optimal merging of streamed deltas yields ~12× checkpoint frequency with negligible throughput loss. (summarized by gpt-5-mini on Feb 09 2026)
Incoming Non-self Citations Over Time
No non-self incoming citations found for this paper in this database.
Authors
- 1. Xin Gao
- 2. Sibasish Acharya
- 3. Sihui Han
- 4. Yongxiong Ren
- 5. Yanli Zhao
- 6. Liang Luo
- 7. Chucheng Wang
- 8. Pradeep Fernando
- 9. Saurabh Mishra
- 10. Siqi Yan
- 11. Yicong Du
- 12. Elzbieta Krepska
- 13. Intaik Park
- 14. Min Ni
- 15. Qunshu Zhang
- 16. Shen Li
Incoming Citations (Sorted by Pagerank)
Showing 0 of 0 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 2 of 2 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 411 | PyTorch Distributed: Experiences on Accelerating Data Parallel Training | 2020 | VLDB | 0.00023906921 |
| 2,902 | PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel | 2023 | VLDB | 7.93939e-05 |
Previous
Page 1 / 1
Next