Back to papers
Towards Observability for Production Machine Learning Pipelines
Summary: End-to-end observability for production ML pipelines to address post-deployment issues like data shift and silent failures. Proposes a bolt-on data-management architecture enabling detection, diagnosis, and reaction, wrapping existing tools to deliver ML observability.
(summarized by gpt-5-nano on Feb 09 2026)
- Paper ID
- 12911
- Venue
- VLDB
- Year
- 2022
- Pagerank
- 4.3928288e-05
- Overall Rank
- 9,118 | 36.57%
- DOI
-
10.14778/3565838.3565853
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 1 of 1 citing papers.
Outgoing Citations (Sorted by Pagerank)
Showing 27 of 27 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 18 |
On Random Sampling over Joins |
1999 |
SIGMOD |
0.00092385438 |
| 70 |
Hive - A Warehousing Solution Over a Map-Reduce Framework |
2009 |
VLDB |
0.00059533166 |
| 192 |
HoloClean: Holistic Data Repairs with Probabilistic Inference |
2017 |
VLDB |
0.00035728858 |
| 429 |
The Aqua Approximate Query Answering System |
1999 |
SIGMOD |
0.00023476494 |
| 758 |
Deep Unsupervised Cardinality Estimation |
2020 |
VLDB |
0.0001706608 |
| 791 |
ActiveClean: Interactive Data Cleaning For Statistical Modeling |
2016 |
VLDB |
0.00016629664 |
| 1,323 |
Quickr: Lazily Approximating Complex AdHoc Queries in BigData Clusters |
2016 |
SIGMOD |
0.00012601997 |
| 1,420 |
Data Management Challenges in Production Machine Learning |
2017 |
SIGMOD |
0.00012057956 |
| 1,482 |
Automating Large-Scale Data Quality Verification |
2018 |
VLDB |
0.00011725533 |
| 1,612 |
Detecting Data Errors: Where are we and what needs to be done? |
2016 |
VLDB |
0.00011142794 |
| 1,940 |
SliceLine: Fast, Linear-Algebra-based Slice Finding for ML Model Debugging |
2021 |
SIGMOD |
0.00010020173 |
| 2,152 |
MISTIQUE: A System to Store and Query Model Intermediates for Model Diagnosis |
2018 |
SIGMOD |
9.4239787e-05 |
| 2,163 |
Elastic Machine Learning Algorithms in Amazon SageMaker |
2020 |
SIGMOD |
9.3949234e-05 |
| 2,269 |
Ground: A Data Context Service |
2017 |
CIDR |
9.147379e-05 |
| 2,460 |
Combining Quantitative and Logical Data Cleaning |
2016 |
VLDB |
8.7617484e-05 |
| 4,003 |
Data Platform for Machine Learning |
2019 |
SIGMOD |
6.54347e-05 |
| 4,196 |
Overton: A Data System for Monitoring and Improving Machine-Learned Products |
2020 |
CIDR |
6.3686231e-05 |
| 4,350 |
On Biased Reservoir Sampling in the Presence of Stream Evolution |
2006 |
VLDB |
6.2645054e-05 |
| 4,734 |
MLINSPECT: A Data Distribution Debugger for Machine Learning Pipelines |
2021 |
SIGMOD |
5.9615384e-05 |
| 5,372 |
ReproZip: Computational Reproducibility With Ease |
2016 |
SIGMOD |
5.5428429e-05 |
| 5,684 |
Dagger: A Data (not code) Debugger |
2020 |
CIDR |
5.3720749e-05 |
| 6,493 |
Joins on Samples: A Theoretical Guide for Practitioners |
2020 |
VLDB |
5.0424713e-05 |
| 6,733 |
Hindsight Logging for Model Training |
2021 |
VLDB |
4.9467666e-05 |
| 6,740 |
Combining Aggregation and Sampling (Nearly) Optimally for Approximate Query Processing |
2021 |
SIGMOD |
4.944395e-05 |
| 8,163 |
Capturing and Querying Fine-grained Provenance of Preprocessing Pipelines in Data Science |
2021 |
VLDB |
4.5723431e-05 |
| 9,221 |
VisClean: Interactive Cleaning for Progressive Visualization |
2020 |
VLDB |
4.3699444e-05 |
| 11,313 |
Towards Observability for Machine Learning Pipelines |
2022 |
CIDR |
4.1945683e-05 |
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 6,469 |
Materialization and Reuse Optimizations for Production Data Science Pipelines |
2022 |
SIGMOD |
5.0519488e-05 |
| 8,257 |
Automating and Optimizing Data-Centric What-If Analyses on Native Machine Learning Pipelines |
2023 |
SIGMOD |
4.5487511e-05 |
| 4,734 |
MLINSPECT: A Data Distribution Debugger for Machine Learning Pipelines |
2021 |
SIGMOD |
5.9615384e-05 |
| 13,231 |
Cloud Observability: A MELTing Pot for Petabytes of Heterogenous Time Series |
2021 |
CIDR |
- |
| 7,138 |
Ease.ml/ci and Ease.ml/meter in Action: Towards Data Management for Statistical Generalization |
2019 |
VLDB |
4.8216981e-05 |
| 4,003 |
Data Platform for Machine Learning |
2019 |
SIGMOD |
6.54347e-05 |
| 6,291 |
Lightweight Inspection of Data Preprocessing in Native Machine Learning Pipelines |
2021 |
CIDR |
5.1269764e-05 |
| 1,420 |
Data Management Challenges in Production Machine Learning |
2017 |
SIGMOD |
0.00012057956 |
| 2,456 |
Production Machine Learning Pipelines: Empirical Analysis and Optimization Opportunities |
2021 |
SIGMOD |
8.7733773e-05 |
| 11,313 |
Towards Observability for Machine Learning Pipelines |
2022 |
CIDR |
4.1945683e-05 |