Optimizing Inference Serving on Serverless Platforms
Summary: We present Multi-Buffer Serving (MBS), a framework for optimally batching heterogeneous ML inference on serverless platforms. Analytical models combined with Bayesian optimization choose batches to minimize cost under SLOs, reducing padding overhead and function invocations, with up to 8x cost savings on AWS. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Ahsan Ali
- 2. Riccardo Pinciroli
- 3. Feng Yan
- 4. Evgenia Smirni
Incoming Citations (Sorted by Pagerank)
Showing 2 of 2 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 9,326 | BladeDISC: Optimizing Dynamic Shape Machine Learning Workloads via Compiler Approach | 2023 | SIGMOD | 4.3556432e-05 |
| 10,325 | KEN: An Execution Engine for Unstructured Database Systems | 2026 | VLDB | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 6 of 6 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 1,845 | Improving Optimistic Concurrency Control Through Transaction Batching and Operation Reordering | 2019 | VLDB | 0.00010338323 |
| 2,424 | Lambada: Interactive Data Analytics on Cold Data Using Serverless Cloud Infrastructure | 2020 | SIGMOD | 8.8380822e-05 |
| 2,791 | Towards Demystifying Serverless Machine Learning Training | 2021 | SIGMOD | 8.1206618e-05 |
| 4,586 | Transactional Causal Consistency for Serverless Computing | 2020 | SIGMOD | 6.0658825e-05 |
| 5,187 | Stateful Functions as a Service in Action | 2019 | VLDB | 5.6400706e-05 |
| 5,955 | LMFAO: An Engine for Batches of Group-By Aggregates | 2020 | VLDB | 5.2572882e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 924 | Serverless Computing: One Step Forward, Two Steps Back | 2019 | CIDR | 0.00015272958 |
| 10,889 | Off-the-shelf Data Analytics on Serverless | 2024 | CIDR | 4.1945683e-05 |
| 10,595 | Optimized Batch Prompting for Cost-effective LLMs | 2025 | VLDB | 4.1945683e-05 |
| 9,525 | Cost-efficiency and Performance Robustness in Serverless Data Exchange | 2022 | SIGMOD | 4.3310169e-05 |
| 4,961 | Releasing Cloud Databases from the Chains of Performance Prediction Models | 2017 | CIDR | 5.7984657e-05 |
| 11,571 | Serverless Query Processing on a Budget | 2020 | SIGMOD | 4.1945683e-05 |
| 4,687 | Serving and Optimizing Machine Learning Workflows on Heterogeneous Infrastructures | 2023 | VLDB | 5.9986055e-05 |
| 9,677 | Apt-Serve: Adaptive Request Scheduling on Hybrid Cache for Scalable LLM Inference Serving | 2025 | SIGMOD | 4.3047774e-05 |
| 2,791 | Towards Demystifying Serverless Machine Learning Training | 2021 | SIGMOD | 8.1206618e-05 |
| 7,269 | Serverless Data Science - Are We There Yet? A Case Study of Model Serving | 2022 | SIGMOD | 4.7815303e-05 |