Database Paper Browser

Back to papers

MEMO: Fine-grained Tensor Management For Ultra-long Context LLM Training

Summary: Memo enables ultra-long context LLM training via fine-grained activation memory management: offloads activations to CPU after each layer and fetches them in backprop with token-wise recomputation. Bi-level MIP optimizes cross-layer memory reuse to curb fragmentation and communication, delivering MFU gains over Megatron-LM and DeepSpeed. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
7048
Venue
SIGMOD
Year
2025
Pagerank
4.2805224e-05
Overall Rank
9,805 | 31.79%
DOI
10.1145/3709703

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 2 of 2 citing papers.

Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 20 of 20 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
1,504 Analyzing and Mitigating Data Stalls in DNN Training 2021 VLDB 0.00011642333
2,170 tf.data: A Machine Learning Data Processing Framework 2021 VLDB 9.3821603e-05
2,330 Concurrent Analytical Query Processing with GPUs 2014 VLDB 9.0192228e-05
2,352 MiCS: Near-linear Scaling for Training Gigantic Model on Public Cloud 2023 VLDB 8.9766205e-05
2,677 HET: Scaling out Huge Embedding Model Training via Cache-enabled Distributed Framework 2022 VLDB 8.3268401e-05
2,902 PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel 2023 VLDB 7.93939e-05
3,698 Where Is My Training Bottleneck? Hidden Trade-Offs in Deep Learning Preprocessing Pipelines 2022 SIGMOD 6.8340435e-05
3,898 Efficient Join Algorithms For Large Database Tables in a Multi-GPU Environment 2021 VLDB 6.6551268e-05
4,047 Orca: Scalable Temporal Graph Neural Network Training with Theoretical Guarantees 2023 SIGMOD 6.4972105e-05
4,180 FastFlow: Accelerating Deep Learning Model Training with Smart Offloading of Input Data Pipeline 2023 VLDB 6.3793352e-05
4,701 Tensors: An abstraction for general data processing 2021 VLDB 5.9866564e-05
5,143 Memory Management Techniques for Large-Scale Persistent-Main-Memory Systems 2017 VLDB 5.6657259e-05
5,821 Tensor Relational Algebra for Distributed Machine Learning System Design 2021 VLDB 5.3134851e-05
6,156 Optimizing Tensor Programs on Flexible Storage 2023 SIGMOD 5.1802603e-05
6,377 Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism 2023 VLDB 5.0911095e-05
7,014 SIMPLE: Efficient Temporal Graph Neural Network Training at Scale with Dynamic Data Placement 2024 SIGMOD 4.8616315e-05
8,126 SDPipe: A Semi-Decentralized Framework for Heterogeneity-aware Pipeline-parallel Training 2023 VLDB 4.5796615e-05
8,157 TOD: GPU-accelerated Outlier Detection via Tensor Operations 2023 VLDB 4.5730908e-05
9,402 CAFE: Towards Compact, Adaptive, and Fast Embedding for Large-scale Recommendation Models 2024 SIGMOD 4.3441378e-05
9,408 Experimental Analysis of Large-scale Learnable Vector Storage Compression 2024 VLDB 4.3441378e-05
Previous Page 1 / 1 Next

Semantically Similar Papers