Database Paper Browser

Back to papers

Towards Observability for Production Machine Learning Pipelines

Summary: End-to-end observability for production ML pipelines to address post-deployment issues like data shift and silent failures. Proposes a bolt-on data-management architecture enabling detection, diagnosis, and reaction, wrapping existing tools to deliver ML observability. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
12911
Venue
VLDB
Year
2022
Pagerank
4.3928288e-05
Overall Rank
9,118 | 36.57%
DOI
10.14778/3565838.3565853

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 1 of 1 citing papers.

Rank Citing Paper Year Venue Pagerank
9,231 Modyn: Data-Centric Machine Learning Pipeline Orchestration 2025 SIGMOD 4.3690661e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 27 of 27 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
18 On Random Sampling over Joins 1999 SIGMOD 0.00092385438
70 Hive - A Warehousing Solution Over a Map-Reduce Framework 2009 VLDB 0.00059533166
192 HoloClean: Holistic Data Repairs with Probabilistic Inference 2017 VLDB 0.00035728858
429 The Aqua Approximate Query Answering System 1999 SIGMOD 0.00023476494
758 Deep Unsupervised Cardinality Estimation 2020 VLDB 0.0001706608
791 ActiveClean: Interactive Data Cleaning For Statistical Modeling 2016 VLDB 0.00016629664
1,323 Quickr: Lazily Approximating Complex AdHoc Queries in BigData Clusters 2016 SIGMOD 0.00012601997
1,420 Data Management Challenges in Production Machine Learning 2017 SIGMOD 0.00012057956
1,482 Automating Large-Scale Data Quality Verification 2018 VLDB 0.00011725533
1,612 Detecting Data Errors: Where are we and what needs to be done? 2016 VLDB 0.00011142794
1,940 SliceLine: Fast, Linear-Algebra-based Slice Finding for ML Model Debugging 2021 SIGMOD 0.00010020173
2,152 MISTIQUE: A System to Store and Query Model Intermediates for Model Diagnosis 2018 SIGMOD 9.4239787e-05
2,163 Elastic Machine Learning Algorithms in Amazon SageMaker 2020 SIGMOD 9.3949234e-05
2,269 Ground: A Data Context Service 2017 CIDR 9.147379e-05
2,460 Combining Quantitative and Logical Data Cleaning 2016 VLDB 8.7617484e-05
4,003 Data Platform for Machine Learning 2019 SIGMOD 6.54347e-05
4,196 Overton: A Data System for Monitoring and Improving Machine-Learned Products 2020 CIDR 6.3686231e-05
4,350 On Biased Reservoir Sampling in the Presence of Stream Evolution 2006 VLDB 6.2645054e-05
4,734 MLINSPECT: A Data Distribution Debugger for Machine Learning Pipelines 2021 SIGMOD 5.9615384e-05
5,372 ReproZip: Computational Reproducibility With Ease 2016 SIGMOD 5.5428429e-05
5,684 Dagger: A Data (not code) Debugger 2020 CIDR 5.3720749e-05
6,493 Joins on Samples: A Theoretical Guide for Practitioners 2020 VLDB 5.0424713e-05
6,733 Hindsight Logging for Model Training 2021 VLDB 4.9467666e-05
6,740 Combining Aggregation and Sampling (Nearly) Optimally for Approximate Query Processing 2021 SIGMOD 4.944395e-05
8,163 Capturing and Querying Fine-grained Provenance of Preprocessing Pipelines in Data Science 2021 VLDB 4.5723431e-05
9,221 VisClean: Interactive Cleaning for Progressive Visualization 2020 VLDB 4.3699444e-05
11,313 Towards Observability for Machine Learning Pipelines 2022 CIDR 4.1945683e-05
Previous Page 1 / 1 Next

Semantically Similar Papers