Stress-Testing ML Pipelines with Adversarial Data Corruption
Summary: SAVAGE: a causally inspired framework that encodes structured data-quality issues via dependency graphs and flexible corruption templates, then uses bi-level optimization to find worst-case corruptions against an entire ML pipeline treated as a black box. Demonstrates that ~5% targeted corruptions discovered by SAVAGE drastically harm performance across cleaning, fairness, and uncertainty tasks, far outperforming random/manual corruptions and exposing key robustness blind spots. (summarized by gpt-5-mini on Feb 09 2026)
Incoming Non-self Citations Over Time
No non-self incoming citations found for this paper in this database.
Authors
- 1. Jiongli Zhu
- 2. Geyang Xu
- 3. Felipe Lorenzi
- 4. Boris Glavic
- 5. Babak Salimi
Incoming Citations (Sorted by Pagerank)
Showing 0 of 0 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|
Outgoing Citations (Sorted by Pagerank)
Showing 6 of 6 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 942 | A Formal Approach to Finding Explanations for Database Queries | 2014 | SIGMOD | 0.00015155714 |
| 1,867 | Interpretable Data-Based Explanations for Fairness Debugging | 2022 | SIGMOD | 0.00010272055 |
| 1,940 | SliceLine: Fast, Linear-Algebra-based Slice Finding for ML Model Debugging | 2021 | SIGMOD | 0.00010020173 |
| 5,429 | DiffPrep: Differentiable Data Preprocessing Pipeline Search for Learning over Tabular Data | 2023 | SIGMOD | 5.5087325e-05 |
| 6,779 | Explaining Inference Queries with Bayesian Optimization | 2021 | VLDB | 4.9280116e-05 |
| 7,046 | Through the Data Management Lens: Experimental Analysis and Evaluation of Fair Classification | 2022 | SIGMOD | 4.8525913e-05 |