HotPrefix: Hotness-Aware KV Cache Scheduling for Efficient Prefix Sharing in LLM Inference Systems
Summary: HotPrefix: hotness-aware KV-cache scheduler that tracks long-tail prefix reuse and pins hot prefixes in GPU HBM while offloading cold prefixes to CPU RAM. Overlaps KV transfers with computation to avoid redundant KV recomputation, cut HBM pressure, and boost latency/throughput for shared-prefix LLM inference. (summarized by gpt-5-mini on Feb 11 2026)
Incoming Non-self Citations Over Time
No non-self incoming citations found for this paper in this database.
Authors
- 1. Yuhang Li
- 2. Rong Gu
- 3. Chengying Huan
- 4. Zhibin Wang
- 5. Renjie Yao
- 6. Chen Tian
- 7. Guihai Chen
Incoming Citations (Sorted by Pagerank)
Showing 0 of 0 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 1 of 1 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 695 | 2Q: A Low Overhead High Performance Buffer Management Replacement Algorithm | 1994 | VLDB | 0.00018061376 |
Previous
Page 1 / 1
Next