Back to papers
Beluga: A CXL-Based Memory Architecture for Scalable and Efficient LLM KVCache Management
Summary: Beluga exploits CXL switches to expose a shared, large-scale memory pool with native load/store access for GPU/CPU KVCache, avoiding RDMA’s latency/protocol overhead. Beluga-KVCache uses this architecture to scale long-context LLM inference, cutting TTFT 89.6% and boosting vLLM throughput 7.35x.
(summarized by gpt-5-mini on Apr 11 2026)
- Paper ID
- 7453
- Venue
- SIGMOD
- Year
- 2026
- Pagerank
- 4.1945683e-05
- Overall Rank
- 10,143 | 29.44%
- DOI
-
10.1145/3786627
Incoming Non-self Citations Over Time
No non-self incoming citations found for this paper in this database.
Incoming Citations (Sorted by Pagerank)
Showing 0 of 0 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
Outgoing Citations (Sorted by Pagerank)
Showing 17 of 17 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 1,593 |
PolarFS: An Ultra-low Latency and Failure Resilient Distributed File System for Shared Storage Cloud Database |
2018 |
VLDB |
0.00011224049 |
| 1,872 |
ReAcTable: Enhancing ReAct for Table Question Answering |
2024 |
VLDB |
0.00010259702 |
| 2,572 |
Efficient Distributed Memory Management with RDMA and Caching |
2018 |
VLDB |
8.519943e-05 |
| 3,859 |
OpenSearch-SQL: Enhancing Text-to-SQL with Dynamic Few-shot and Consistency Alignment |
2025 |
SIGMOD |
6.6907933e-05 |
| 3,936 |
Rethinking Database High Availability with RDMA Networks |
2019 |
VLDB |
6.6162264e-05 |
| 4,544 |
ScaleStore: A Fast and Cost-Efficient Storage Engine using DRAM, NVMe, and RDMA |
2022 |
SIGMOD |
6.1000636e-05 |
| 5,042 |
Design Guidelines for Correct, Efficient, and Scalable Synchronization using One-Sided RDMA |
2023 |
SIGMOD |
5.7414429e-05 |
| 6,223 |
Distributed GPU Joins on Fast RDMA-capable Networks |
2023 |
SIGMOD |
5.1496398e-05 |
| 6,741 |
DEX: Scalable Range Indexing on Disaggregated Memory |
2024 |
VLDB |
4.9432931e-05 |
| 6,796 |
InferDB: In-Database Machine Learning Inference Using Indexes |
2024 |
VLDB |
4.9241624e-05 |
| 7,061 |
Serving Deep Learning Models with Deduplication from Relational Databases |
2022 |
VLDB |
4.8463881e-05 |
| 7,339 |
SpareLLM: Automatically Selecting Task-Specific Minimum-Cost Large Language Models under Equivalence Constraint |
2025 |
SIGMOD |
4.7579469e-05 |
| 8,001 |
Rethinking Stateful Stream Processing with RDMA |
2022 |
SIGMOD |
4.6092573e-05 |
| 8,649 |
Zero-sided RDMA: Network-driven Data Shuffling for Disaggregated Heterogeneous Cloud DBMSs |
2024 |
SIGMOD |
4.4762914e-05 |
| 8,950 |
Unlocking the Potential of CXL for Disaggregated Memory in Cloud-Native Databases |
2025 |
SIGMOD |
4.4231907e-05 |
| 9,476 |
Adda: Towards Efficient in-Database Feature Generation via LLM-based Agents |
2025 |
SIGMOD |
4.3341665e-05 |
| 10,782 |
From Scale-Up to Scale-Out: PolarDB's Journey to Achieving 2 Billion tpmC |
2025 |
VLDB |
4.1945683e-05 |
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 10,222 |
RetroInfer: A Vector Storage Engine for Scalable Long-Context LLM Inference |
2026 |
VLDB |
4.1945683e-05 |
| 10,066 |
DepCache: A KV Cache Management Framework for GraphRAG with Dependency Attention |
2026 |
SIGMOD |
4.1945683e-05 |
| 13,138 |
Database Perspective on LLM Inference Systems |
2025 |
VLDB |
- |
| 8,513 |
CXL Memory Performance for In-Memory Data Processing |
2025 |
VLDB |
4.4947795e-05 |
| 13,088 |
OrbitFlow: SLO-Aware Long-Context LLM Serving with Fine-Grained KV Cache Reconfiguration |
2026 |
VLDB |
- |
| 3,565 |
Cache-Craft: Managing Chunk-Caches for Efficient Retrieval-Augmented Generation |
2025 |
SIGMOD |
6.9655362e-05 |
| 9,805 |
MEMO: Fine-grained Tensor Management For Ultra-long Context LLM Training |
2025 |
SIGMOD |
4.2805224e-05 |
| 10,020 |
HotPrefix: Hotness-Aware KV Cache Scheduling for Efficient Prefix Sharing in LLM Inference Systems |
2026 |
SIGMOD |
4.1945683e-05 |
| 8,950 |
Unlocking the Potential of CXL for Disaggregated Memory in Cloud-Native Databases |
2025 |
SIGMOD |
4.4231907e-05 |
| 6,357 |
PQCache: Product Quantization-based KVCache for Long Context LLM Inference |
2025 |
SIGMOD |
5.0970739e-05 |