From Prefix Cache to Fusion RAG Cache: Accelerating LLM Inference in Retrieval-Augmented Generation
Summary: FusionRAG: RAG inference cache reuse beyond prefix caching. Offline chunk fusion injects cross-chunk context; online selective KV recomputation for attention-critical tokens preserves quality while cutting TTFT, outperforming prior KVCache reuse at the same recompute budget. (summarized by gpt-5-mini on Apr 11 2026)
Incoming Non-self Citations Over Time
No non-self incoming citations found for this paper in this database.
Authors
- 1. Jiahao Wang
- 2. Weiyu Xie
- 3. Mingxing Zhang
- 4. Boxing Zhang
- 5. Jianwei Dong
- 6. Yuening Zhu
- 7. Chen Lin
- 8. Jingqi Tang
- 9. Yaochen Han
- 10. Zhiyuan Ai
- 11. Xianglin Chen
- 12. Yongwei Wu
- 13. Congfeng Jiang
Incoming Citations (Sorted by Pagerank)
Showing 0 of 0 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 1 of 1 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 3,565 | Cache-Craft: Managing Chunk-Caches for Efficient Retrieval-Augmented Generation | 2025 | SIGMOD | 6.9655362e-05 |
Previous
Page 1 / 1
Next