PQCache: Product Quantization-based KVCache for Long Context LLM Inference
Summary: PQCache uses PQ to compress KVCache for long-context LLM inference, treating KVCache as embedding retrieval. During prefilling and autoregressive decoding, PQ codes and centroids approximate key selection to fetch K/V, reducing overhead. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Hailin Zhang
- 2. Xiaodong Ji
- 3. Yilin Chen
- 4. Fangcheng Fu
- 5. Xupeng Miao
- 6. Xiaonan Nie
- 7. Weipeng Chen
- 8. Bin Cui
Incoming Citations (Sorted by Pagerank)
Showing 4 of 4 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 9,103 | AlayaDB: The Data Foundation for Efficient and Effective Long-context LLM Inference | 2025 | SIGMOD | 4.3958197e-05 |
| 9,677 | Apt-Serve: Adaptive Request Scheduling on Hybrid Cache for Scalable LLM Inference Serving | 2025 | SIGMOD | 4.3047774e-05 |
| 10,066 | DepCache: A KV Cache Management Framework for GraphRAG with Dependency Attention | 2026 | SIGMOD | 4.1945683e-05 |
| 10,222 | RetroInfer: A Vector Storage Engine for Scalable Long-Context LLM Inference | 2026 | VLDB | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 10 of 10 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
Previous
Page 1 / 1
Next