Back to papers
MEMO: Fine-grained Tensor Management For Ultra-long Context LLM Training
Summary: Memo enables ultra-long context LLM training via fine-grained activation memory management: offloads activations to CPU after each layer and fetches them in backprop with token-wise recomputation. Bi-level MIP optimizes cross-layer memory reuse to curb fragmentation and communication, delivering MFU gains over Megatron-LM and DeepSpeed.
(summarized by gpt-5-nano on Feb 09 2026)
- Paper ID
- 7048
- Venue
- SIGMOD
- Year
- 2025
- Pagerank
- 4.2805224e-05
- Overall Rank
- 9,805 | 31.79%
- DOI
-
10.1145/3709703
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 2 of 2 citing papers.
Outgoing Citations (Sorted by Pagerank)
Showing 20 of 20 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 1,504 |
Analyzing and Mitigating Data Stalls in DNN Training |
2021 |
VLDB |
0.00011642333 |
| 2,170 |
tf.data: A Machine Learning Data Processing Framework |
2021 |
VLDB |
9.3821603e-05 |
| 2,330 |
Concurrent Analytical Query Processing with GPUs |
2014 |
VLDB |
9.0192228e-05 |
| 2,352 |
MiCS: Near-linear Scaling for Training Gigantic Model on Public Cloud |
2023 |
VLDB |
8.9766205e-05 |
| 2,677 |
HET: Scaling out Huge Embedding Model Training via Cache-enabled Distributed Framework |
2022 |
VLDB |
8.3268401e-05 |
| 2,902 |
PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel |
2023 |
VLDB |
7.93939e-05 |
| 3,698 |
Where Is My Training Bottleneck? Hidden Trade-Offs in Deep Learning Preprocessing Pipelines |
2022 |
SIGMOD |
6.8340435e-05 |
| 3,898 |
Efficient Join Algorithms For Large Database Tables in a Multi-GPU Environment |
2021 |
VLDB |
6.6551268e-05 |
| 4,047 |
Orca: Scalable Temporal Graph Neural Network Training with Theoretical Guarantees |
2023 |
SIGMOD |
6.4972105e-05 |
| 4,180 |
FastFlow: Accelerating Deep Learning Model Training with Smart Offloading of Input Data Pipeline |
2023 |
VLDB |
6.3793352e-05 |
| 4,701 |
Tensors: An abstraction for general data processing |
2021 |
VLDB |
5.9866564e-05 |
| 5,143 |
Memory Management Techniques for Large-Scale Persistent-Main-Memory Systems |
2017 |
VLDB |
5.6657259e-05 |
| 5,821 |
Tensor Relational Algebra for Distributed Machine Learning System Design |
2021 |
VLDB |
5.3134851e-05 |
| 6,156 |
Optimizing Tensor Programs on Flexible Storage |
2023 |
SIGMOD |
5.1802603e-05 |
| 6,377 |
Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism |
2023 |
VLDB |
5.0911095e-05 |
| 7,014 |
SIMPLE: Efficient Temporal Graph Neural Network Training at Scale with Dynamic Data Placement |
2024 |
SIGMOD |
4.8616315e-05 |
| 8,126 |
SDPipe: A Semi-Decentralized Framework for Heterogeneity-aware Pipeline-parallel Training |
2023 |
VLDB |
4.5796615e-05 |
| 8,157 |
TOD: GPU-accelerated Outlier Detection via Tensor Operations |
2023 |
VLDB |
4.5730908e-05 |
| 9,402 |
CAFE: Towards Compact, Adaptive, and Fast Embedding for Large-scale Recommendation Models |
2024 |
SIGMOD |
4.3441378e-05 |
| 9,408 |
Experimental Analysis of Large-scale Learnable Vector Storage Compression |
2024 |
VLDB |
4.3441378e-05 |
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 8,808 |
FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement |
2023 |
SIGMOD |
4.4454035e-05 |
| 13,088 |
OrbitFlow: SLO-Aware Long-Context LLM Serving with Fine-Grained KV Cache Reconfiguration |
2026 |
VLDB |
- |
| 3,565 |
Cache-Craft: Managing Chunk-Caches for Efficient Retrieval-Augmented Generation |
2025 |
SIGMOD |
6.9655362e-05 |
| 6,357 |
PQCache: Product Quantization-based KVCache for Long Context LLM Inference |
2025 |
SIGMOD |
5.0970739e-05 |
| 10,143 |
Beluga: A CXL-Based Memory Architecture for Scalable and Efficient LLM KVCache Management |
2026 |
SIGMOD |
4.1945683e-05 |
| 8,520 |
mLoRA: Fine-Tuning LoRA Adapters via Highly-Efficient Pipeline Parallelism in Multiple GPUs |
2025 |
VLDB |
4.4937074e-05 |
| 9,876 |
Near-Duplicate Sequence Search at Scale for Large Language Model Memorization Evaluation |
2023 |
SIGMOD |
4.2667743e-05 |
| 10,222 |
RetroInfer: A Vector Storage Engine for Scalable Long-Context LLM Inference |
2026 |
VLDB |
4.1945683e-05 |
| 9,170 |
MemFlow: Memory-Aware Distributed Deep Learning |
2020 |
SIGMOD |
4.3849075e-05 |
| 7,152 |
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity |
2024 |
VLDB |
4.8154191e-05 |