Back to papers
Learning to Validate the Predictions of Black Box Classifiers on Unseen Data
Summary: Learns a performance predictor for pretrained black-box classifiers using programmatic specifications of dataset shift and data errors, without distributional assumptions. Alarms on predicted accuracy drops on unseen serving data and outperforms baselines across datasets and error types.
(summarized by gpt-5-nano on Feb 09 2026)
- Paper ID
- 5817
- Venue
- SIGMOD
- Year
- 2020
- Pagerank
- 6.4428544e-05
- Overall Rank
- 4,110 | 71.41%
- DOI
-
10.1145/3318464.3380604
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 10 of 10 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
| 1,940 |
SliceLine: Fast, Linear-Algebra-based Slice Finding for ML Model Debugging |
2021 |
SIGMOD |
0.00010020173 |
| 5,028 |
Adaptive Data Augmentation for Supervised Learning over Missing Data |
2021 |
VLDB |
5.7506746e-05 |
| 6,944 |
DataPrism: Exposing Disconnect between Data and Systems |
2022 |
SIGMOD |
4.8912787e-05 |
| 7,202 |
Conformance Constraint Discovery: Measuring Trust in Data-Driven Systems |
2021 |
SIGMOD |
4.8023314e-05 |
| 8,092 |
Saga: A Scalable Framework for Optimizing Data Cleaning Pipelines for Machine Learning Applications |
2023 |
SIGMOD |
4.587921e-05 |
| 9,098 |
Scapin: Scalable Graph Structure Perturbation by Augmented Influence Maximization |
2023 |
SIGMOD |
4.3967784e-05 |
| 9,231 |
Modyn: Data-Centric Machine Learning Pipeline Orchestration |
2025 |
SIGMOD |
4.3690661e-05 |
| 9,806 |
The Image Calculator: 10x Faster Image-AI Inference by Replacing JPEG with Self-designing Storage Format |
2024 |
SIGMOD |
4.2805224e-05 |
| 11,052 |
Efficiently Mitigating the Impact of Data Drift on Machine Learning Pipelines |
2024 |
VLDB |
4.1945683e-05 |
| 11,500 |
Comprehensible Counterfactual Explanation on Kolmogorov-Smirnov Test |
2021 |
VLDB |
4.1945683e-05 |
Outgoing Citations (Sorted by Pagerank)
Showing 1 of 1 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 11,627 |
Exploiting Domain Knowledge to address Multi-Class Imbalance and a Heterogeneous Feature Space in Classification Tasks for Manufacturing Data |
2020 |
VLDB |
4.1945683e-05 |
| 3,491 |
TensorFlow Data Validation: Data Analysis and Validation in Continuous ML Pipelines |
2020 |
SIGMOD |
7.0451276e-05 |
| 11,313 |
Towards Observability for Machine Learning Pipelines |
2022 |
CIDR |
4.1945683e-05 |
| 7,838 |
Auto-Validate: Unsupervised Data Validation Using Data-Domain Patterns Inferred from Data Lakes |
2021 |
SIGMOD |
4.6377995e-05 |
| 3,142 |
Active Learning for ML Enhanced Database Systems |
2020 |
SIGMOD |
7.4815444e-05 |
| 1,482 |
Automating Large-Scale Data Quality Verification |
2018 |
VLDB |
0.00011725533 |
| 11,000 |
MisDetect: Iterative Mislabel Detection using Early Loss |
2024 |
VLDB |
4.1945683e-05 |
| 6,134 |
Finding Label and Model Errors in Perception Data With Learned Observation Assertions |
2022 |
SIGMOD |
5.1943414e-05 |
| 9,118 |
Towards Observability for Production Machine Learning Pipelines |
2022 |
VLDB |
4.3928288e-05 |
| 11,052 |
Efficiently Mitigating the Impact of Data Drift on Machine Learning Pipelines |
2024 |
VLDB |
4.1945683e-05 |