Back to papers
OrbitFlow: SLO-Aware Long-Context LLM Serving with Fine-Grained KV Cache Reconfiguration
Summary: OrbitFlow: SLO-aware long-context LLM serving via adaptive, fine-grained KV-cache reconfiguration. Uses lightweight ILP + runtime feedback to choose per-request layer placements and defer memory-hungry in-flight requests, cutting offload-induced latency spikes and boosting SLO attainment.
(summarized by gpt-5.4-mini on Apr 12 2026)
- Paper ID
- 14258
- Venue
- VLDB
- Year
- 2026
- Pagerank
- -
- Overall Rank
- 13,088 | 8.95%
- DOI
-
10.14778/3796195.3796214
Incoming Non-self Citations Over Time
No non-self incoming citations found for this paper in this database.
Incoming Citations (Sorted by Pagerank)
Showing 0 of 0 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
Outgoing Citations (Sorted by Pagerank)
Showing 0 of 0 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 9,103 |
AlayaDB: The Data Foundation for Efficient and Effective Long-context LLM Inference |
2025 |
SIGMOD |
4.3958197e-05 |
| 10,143 |
Beluga: A CXL-Based Memory Architecture for Scalable and Efficient LLM KVCache Management |
2026 |
SIGMOD |
4.1945683e-05 |
| 9,805 |
MEMO: Fine-grained Tensor Management For Ultra-long Context LLM Training |
2025 |
SIGMOD |
4.2805224e-05 |
| 10,222 |
RetroInfer: A Vector Storage Engine for Scalable Long-Context LLM Inference |
2026 |
VLDB |
4.1945683e-05 |
| 3,565 |
Cache-Craft: Managing Chunk-Caches for Efficient Retrieval-Augmented Generation |
2025 |
SIGMOD |
6.9655362e-05 |
| 13,135 |
ContextCache: Context-Aware Semantic Cache for Multi-Turn Queries in Large Language Models |
2025 |
VLDB |
- |
| 10,066 |
DepCache: A KV Cache Management Framework for GraphRAG with Dependency Attention |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,020 |
HotPrefix: Hotness-Aware KV Cache Scheduling for Efficient Prefix Sharing in LLM Inference Systems |
2026 |
SIGMOD |
4.1945683e-05 |
| 6,357 |
PQCache: Product Quantization-based KVCache for Long Context LLM Inference |
2025 |
SIGMOD |
5.0970739e-05 |
| 9,677 |
Apt-Serve: Adaptive Request Scheduling on Hybrid Cache for Scalable LLM Inference Serving |
2025 |
SIGMOD |
4.3047774e-05 |