Database Paper Browser

Back to papers

RetroInfer: A Vector Storage Engine for Scalable Long-Context LLM Inference

Summary: RetroInfer rethinks KV-cache management for long-context LLM inference as a vector storage problem, offloading cache to CPU and retrieving only attention-relevant tokens. Its wave index and wave buffer jointly target sparse-attention accuracy/cost tradeoffs and GPU-CPU data movement, yielding full-attention accuracy with much higher throughput. (summarized by gpt-5.4-mini on Apr 12 2026)

Paper ID
14256
Venue
VLDB
Year
2026
Pagerank
4.1945683e-05
Overall Rank
10,222 | 28.89%
DOI
10.14778/3796195.3796212

Incoming Non-self Citations Over Time

No non-self incoming citations found for this paper in this database.

Authors

Incoming Citations (Sorted by Pagerank)

Showing 0 of 0 citing papers.

Rank Citing Paper Year Venue Pagerank
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 20 of 20 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
212 Fast Approximate Nearest Neighbor Search With The Navigating Spreading-out Graph 2019 VLDB 0.00033913475
495 Milvus: A Purpose-Built Vector Data Management System 2021 SIGMOD 0.00021767688
1,636 PASE: PostgreSQL Ultra-High-Dimensional Approximate Nearest Neighbor Search Extension 2020 SIGMOD 0.00011053863
2,262 Manu: A Cloud Native Vector Database Management System 2022 VLDB 9.1624446e-05
2,320 High-Throughput Vector Similarity Search in Knowledge Graphs 2023 SIGMOD 9.0366225e-05
2,523 ACORN: Performant and Predicate-Agnostic Search Over Vector Embeddings and Structured Data 2024 SIGMOD 8.604576e-05
2,725 HVS: Hierarchical Graph Structure Based on Voronoi Diagrams for Solving Approximate Nearest Neighbor Search 2022 VLDB 8.2294908e-05
2,971 Towards Efficient Index Construction and Approximate Nearest Neighbor Search in High-Dimensional Spaces 2023 VLDB 7.7970531e-05
3,225 DeltaPQ: Lossless Product Quantization Code Compression for High Dimensional Similarity Search 2020 VLDB 7.3463484e-05
3,680 SingleStore-V: An Integrated Vector Database System in SingleStore 2024 VLDB 6.8496415e-05
4,544 ScaleStore: A Fast and Cost-Efficient Storage Engine using DRAM, NVMe, and RDMA 2022 SIGMOD 6.1000636e-05
4,583 Virtual-Memory Assisted Buffer Management 2023 SIGMOD 6.0676378e-05
5,233 RoarGraph: A Projected Bipartite Graph for Efficient Cross-Modal Approximate Nearest Neighbor Search 2024 VLDB 5.6131833e-05
6,357 PQCache: Product Quantization-based KVCache for Long Context LLM Inference 2025 SIGMOD 5.0970739e-05
6,376 DET-LSH: A Locality-Sensitive Hashing Scheme with Dynamic Encoding Tree for Approximate Nearest Neighbor Search 2024 VLDB 5.0916875e-05
6,389 Chat2Data: An Interactive Data Analysis System with RAG, Vector Databases and LLMs 2024 VLDB 5.0844009e-05
6,840 LeanStore: A High-Performance Storage Engine for NVMe SSDs 2024 VLDB 4.9109345e-05
8,687 TigerVector: Supporting Vector Search in Graph Databases for Advanced RAGs 2025 SIGMOD 4.4675056e-05
9,103 AlayaDB: The Data Foundation for Efficient and Effective Long-context LLM Inference 2025 SIGMOD 4.3958197e-05
9,677 Apt-Serve: Adaptive Request Scheduling on Hybrid Cache for Scalable LLM Inference Serving 2025 SIGMOD 4.3047774e-05
Previous Page 1 / 1 Next

Semantically Similar Papers