Database Paper Browser

Back to papers

HotPrefix: Hotness-Aware KV Cache Scheduling for Efficient Prefix Sharing in LLM Inference Systems

Summary: HotPrefix: hotness-aware KV-cache scheduler that tracks long-tail prefix reuse and pins hot prefixes in GPU HBM while offloading cold prefixes to CPU RAM. Overlaps KV transfers with computation to avoid redundant KV recomputation, cut HBM pressure, and boost latency/throughput for shared-prefix LLM inference. (summarized by gpt-5-mini on Feb 11 2026)

Paper ID
7325
Venue
SIGMOD
Year
2026
Pagerank
4.1945683e-05
Overall Rank
10,020 | 30.30%
DOI
10.1145/3749168

Incoming Non-self Citations Over Time

No non-self incoming citations found for this paper in this database.

Authors

Incoming Citations (Sorted by Pagerank)

Showing 0 of 0 citing papers.

Rank Citing Paper Year Venue Pagerank
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 1 of 1 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
695 2Q: A Low Overhead High Performance Buffer Management Replacement Algorithm 1994 VLDB 0.00018061376
Previous Page 1 / 1 Next

Semantically Similar Papers