Database Paper Browser

Back to papers

Apt-Serve: Adaptive Request Scheduling on Hybrid Cache for Scalable LLM Inference Serving

Summary: Hybrid KV+hidden-state cache expands batch size under GPU memory limits for LLM inference. Adaptive scheduling with formal optimization and guarantees tunes batch composition, yielding up to 8.8× throughput vs SOTA on 13B–66B models. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
7282
Venue
SIGMOD
Year
2025
Pagerank
4.3047774e-05
Overall Rank
9,677 | 32.68%
DOI
10.1145/3725394

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 1 of 1 citing papers.

Rank Citing Paper Year Venue Pagerank
10,222 RetroInfer: A Vector Storage Engine for Scalable Long-Context LLM Inference 2026 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 35 of 35 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
329 Accelerating Machine Learning Inference with Probabilistic Predicates 2018 SIGMOD 0.00027249545
1,160 Sancus: Staleness-Aware Communication-Avoiding Full-Graph Decentralized Training in Large-Scale Graph Neural Networks 2022 VLDB 0.00013586221
1,504 Analyzing and Mitigating Data Stalls in DNN Training 2021 VLDB 0.00011642333
2,177 Accelerating Large Scale Real-Time GNN Inference using Channel Pruning 2021 VLDB 9.359876e-05
2,422 DUCATI: A Dual-Cache Training System for Graph Neural Networks on Giant Graphs with the GPU 2023 SIGMOD 8.8499665e-05
2,677 HET: Scaling out Huge Embedding Model Training via Cache-enabled Distributed Framework 2022 VLDB 8.3268401e-05
3,293 Jointly Optimizing Preprocessing and Inference for DNN-based Visual Analytics 2021 VLDB 7.2629834e-05
3,473 AI Meets Database: AI4DB and DB4AI 2021 SIGMOD 7.062864e-05
3,698 Where Is My Training Bottleneck? Hidden Trade-Offs in Deep Learning Preprocessing Pipelines 2022 SIGMOD 6.8340435e-05
3,709 Zebra: When Temporal Graph Neural Networks Meet Temporal Personalized PageRank 2023 VLDB 6.8242482e-05
4,047 Orca: Scalable Temporal Graph Neural Network Training with Theoretical Guarantees 2023 SIGMOD 6.4972105e-05
4,180 FastFlow: Accelerating Deep Learning Model Training with Smart Offloading of Input Data Pipeline 2023 VLDB 6.3793352e-05
5,052 HET-GMP: A Graph-based System Approach to Scaling Large Embedding Model Training 2022 SIGMOD 5.7337977e-05
5,072 Optimizing Machine Learning Inference Queries with Correlative Proxy Models 2022 VLDB 5.7185674e-05
5,212 Self-Tuning Query Scheduling for Analytical Workloads 2021 SIGMOD 5.6262923e-05
5,333 Heterogeneity-Aware Distributed Machine Learning Training via Partial Reduce 2021 SIGMOD 5.5656575e-05
5,475 ETC: Efficient Training of Temporal Graph Neural Networks over Large-scale Dynamic Graphs 2024 VLDB 5.4869706e-05
6,357 PQCache: Product Quantization-based KVCache for Long Context LLM Inference 2025 SIGMOD 5.0970739e-05
6,377 Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism 2023 VLDB 5.0911095e-05
6,485 EARLY: Efficient and Reliable Graph Neural Network for Dynamic Graphs 2023 SIGMOD 5.0453531e-05
7,014 SIMPLE: Efficient Temporal Graph Neural Network Training at Scale with Dynamic Data Placement 2024 SIGMOD 4.8616315e-05
7,152 Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity 2024 VLDB 4.8154191e-05
7,289 DAHA: Accelerating GNN Training with Data and Hardware Aware Execution Planning 2024 VLDB 4.7747168e-05
7,388 Distribution-Based Query Scheduling 2013 VLDB 4.7437725e-05
7,583 Transaction Scheduling: From Conflicts to Runtime Conflicts 2023 SIGMOD 4.7042034e-05
7,696 Towards Optimal Transaction Scheduling 2024 VLDB 4.6754222e-05
8,080 Biathlon: Harnessing Model Resilience for Accelerating ML Inference Pipelines 2024 VLDB 4.5911668e-05
8,126 SDPipe: A Semi-Decentralized Framework for Heterogeneity-aware Pipeline-parallel Training 2023 VLDB 4.5796615e-05
9,381 MorphStream: Adaptive Scheduling for Scalable Transactional Stream Processing on Multicores 2023 SIGMOD 4.3459591e-05
9,705 ETO: Accelerating Optimization of DNN Operators by High-Performance Tensor Program Reuse 2022 VLDB 4.2994163e-05
9,804 Capsule*: An Out-of-Core Training Mechanism for Colossal GNNs 2025 SIGMOD 4.2805224e-05
9,805 MEMO: Fine-grained Tensor Management For Ultra-long Context LLM Training 2025 SIGMOD 4.2805224e-05
9,806 The Image Calculator: 10x Faster Image-AI Inference by Replacing JPEG with Self-designing Storage Format 2024 SIGMOD 4.2805224e-05
9,807 Demonstration of Accelerating Machine Learning Inference Queries with Correlative Proxy Models 2022 VLDB 4.2805224e-05
13,150 STile: Searching Hybrid Sparse Formats for Sparse Deep Learning Operators Automatically 2024 SIGMOD -
Previous Page 1 / 1 Next

Semantically Similar Papers