Cache-Craft: Managing Chunk-Caches for Efficient Retrieval-Augmented Generation
Summary: Cache-Craft uses chunk-caches to reuse K/Vs in RAG, reducing repeated attention. It spots reusable chunk-caches, performs targeted recomputation to preserve quality, and hardware eviction to maximize reuse; reports 51% recomputation reduction and up to 2x throughput on LLaMA models. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Shubham Agarwal
- 2. Sai Sundaresan
- 3. Subrata Mitra
- 4. Debabrata Mahapatra
- 5. Archit Gupta
- 6. Rounak Sharma
- 7. Nirmal Joshua Kapu
- 8. Tong Yu
- 9. Shiv Saini
Incoming Citations (Sorted by Pagerank)
Showing 3 of 3 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 7,931 | In-depth Analysis of Graph-based RAG in a Unified Framework | 2025 | VLDB | 4.613363e-05 |
| 10,066 | DepCache: A KV Cache Management Framework for GraphRAG with Dependency Attention | 2026 | SIGMOD | 4.1945683e-05 |
| 10,170 | From Prefix Cache to Fusion RAG Cache: Accelerating LLM Inference in Retrieval-Augmented Generation | 2026 | SIGMOD | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 7 of 7 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 449 | Approximate Query Processing: Taming the TeraBytes! A Tutorial | 2001 | VLDB | 0.00022846068 |
| 1,204 | VerdictDB: Universalizing Approximate Query Processing | 2018 | SIGMOD | 0.00013319541 |
| 1,366 | SlimDB: A Space-Efficient Key-Value Storage Engine For Semi-Sorted Data | 2017 | VLDB | 0.00012357685 |
| 4,200 | New Trends in High-D Vector Similarity Search: AI-driven, Progressive, and Distributed | 2021 | VLDB | 6.3651489e-05 |
| 5,048 | Put an Elephant into a Fridge: Optimizing Cache Efficiency for In-memory Key-value Stores | 2020 | VLDB | 5.7378052e-05 |
| 6,360 | High-Dimensional Vector Similarity Search: From Time Series to Deep Network Embeddings | 2020 | SIGMOD | 5.0961051e-05 |
| 7,235 | Catalyst: Optimizing Cache Management for Large In-memory Key-value Systems | 2023 | VLDB | 4.7937267e-05 |
Previous
Page 1 / 1
Next