Database Paper Browser

Back to papers

Optimizing Inference Serving on Serverless Platforms

Summary: We present Multi-Buffer Serving (MBS), a framework for optimally batching heterogeneous ML inference on serverless platforms. Analytical models combined with Bayesian optimization choose batches to minimize cost under SLOs, reducing padding overhead and function invocations, with up to 8x cost savings on AWS. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
12703
Venue
VLDB
Year
2022
Pagerank
4.4166105e-05
Overall Rank
8,982 | 37.52%
DOI
10.14778/3547305.3547313

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 2 of 2 citing papers.

Rank Citing Paper Year Venue Pagerank
9,326 BladeDISC: Optimizing Dynamic Shape Machine Learning Workloads via Compiler Approach 2023 SIGMOD 4.3556432e-05
10,325 KEN: An Execution Engine for Unstructured Database Systems 2026 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 6 of 6 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers