Database Paper Browser

Back to papers

PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel

Summary: Introduces PyTorch Fully Sharded Data Parallel (FSDP), an industry-grade, non-intrusive sharding framework co-designed with PyTorch internals (Tensor, dispatcher, CUDA allocator) to enable training of much larger models than DDP. FSDP bundles memory and communication optimizations across hardware to achieve near-linear TFLOPS scalability and DDP-comparable throughput while drastically reducing memory footprint. (summarized by gpt-5-mini on Feb 09 2026)

Paper ID
13212
Venue
VLDB
Year
2023
Pagerank
7.93939e-05
Overall Rank
2,902 | 79.82%
DOI
10.14778/3611540.3611569

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 11 of 11 citing papers.

Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 2 of 2 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
411 PyTorch Distributed: Experiences on Accelerating Data Parallel Training 2020 VLDB 0.00023906921
2,352 MiCS: Near-linear Scaling for Training Gigantic Model on Public Cloud 2023 VLDB 8.9766205e-05
Previous Page 1 / 1 Next

Semantically Similar Papers