Database Paper Browser

Back to papers

Beluga: A CXL-Based Memory Architecture for Scalable and Efficient LLM KVCache Management

Summary: Beluga exploits CXL switches to expose a shared, large-scale memory pool with native load/store access for GPU/CPU KVCache, avoiding RDMA’s latency/protocol overhead. Beluga-KVCache uses this architecture to scale long-context LLM inference, cutting TTFT 89.6% and boosting vLLM throughput 7.35x. (summarized by gpt-5-mini on Apr 11 2026)

Paper ID
7453
Venue
SIGMOD
Year
2026
Pagerank
4.1945683e-05
Overall Rank
10,143 | 29.44%
DOI
10.1145/3786627

Incoming Non-self Citations Over Time

No non-self incoming citations found for this paper in this database.

Authors

Incoming Citations (Sorted by Pagerank)

Showing 0 of 0 citing papers.

Rank Citing Paper Year Venue Pagerank
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 17 of 17 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
1,593 PolarFS: An Ultra-low Latency and Failure Resilient Distributed File System for Shared Storage Cloud Database 2018 VLDB 0.00011224049
1,872 ReAcTable: Enhancing ReAct for Table Question Answering 2024 VLDB 0.00010259702
2,572 Efficient Distributed Memory Management with RDMA and Caching 2018 VLDB 8.519943e-05
3,859 OpenSearch-SQL: Enhancing Text-to-SQL with Dynamic Few-shot and Consistency Alignment 2025 SIGMOD 6.6907933e-05
3,936 Rethinking Database High Availability with RDMA Networks 2019 VLDB 6.6162264e-05
4,544 ScaleStore: A Fast and Cost-Efficient Storage Engine using DRAM, NVMe, and RDMA 2022 SIGMOD 6.1000636e-05
5,042 Design Guidelines for Correct, Efficient, and Scalable Synchronization using One-Sided RDMA 2023 SIGMOD 5.7414429e-05
6,223 Distributed GPU Joins on Fast RDMA-capable Networks 2023 SIGMOD 5.1496398e-05
6,741 DEX: Scalable Range Indexing on Disaggregated Memory 2024 VLDB 4.9432931e-05
6,796 InferDB: In-Database Machine Learning Inference Using Indexes 2024 VLDB 4.9241624e-05
7,061 Serving Deep Learning Models with Deduplication from Relational Databases 2022 VLDB 4.8463881e-05
7,339 SpareLLM: Automatically Selecting Task-Specific Minimum-Cost Large Language Models under Equivalence Constraint 2025 SIGMOD 4.7579469e-05
8,001 Rethinking Stateful Stream Processing with RDMA 2022 SIGMOD 4.6092573e-05
8,649 Zero-sided RDMA: Network-driven Data Shuffling for Disaggregated Heterogeneous Cloud DBMSs 2024 SIGMOD 4.4762914e-05
8,950 Unlocking the Potential of CXL for Disaggregated Memory in Cloud-Native Databases 2025 SIGMOD 4.4231907e-05
9,476 Adda: Towards Efficient in-Database Feature Generation via LLM-based Agents 2025 SIGMOD 4.3341665e-05
10,782 From Scale-Up to Scale-Out: PolarDB's Journey to Achieving 2 Billion tpmC 2025 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Semantically Similar Papers