Fault Lines: Benchmarking the Impact of Label Data Quality on ML Robustness and Fairness

Summary: FAULT LINES: a model-agnostic benchmark (15 datasets, systematic diverse label corruptions) plus an evaluation suite to measure robustness and fairness across 22 SOTA classifiers. Key result: many models resist random noise but <10% biased noise causes large accuracy and fairness losses; transformers often handle biased noise better than GBDTs but with higher tuning-dependent variance. (summarized by gpt-5-mini on Mar 13 2026)

Paper ID: 14363
Venue: VLDB
Year: 2026
Pagerank: 4.1905499e-05
Overall Rank: 10,318 | 28.29%
DOI: 10.14778/3785297.3785308

Incoming Non-self Citations Over Time

No non-self incoming citations found for this paper in this database.

Authors

Incoming Citations (Sorted by Pagerank)

Showing 0 of 0 citing papers.

Rank	Citing Paper	Year	Venue	Pagerank

Outgoing Citations (Sorted by Pagerank)

Showing 3 of 3 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank	Cited Paper	Year	Venue	Pagerank
1,406	Responsible Data Management	2020	VLDB	0.0001216385
3,397	Automatic Data Repair: Are We Ready to Deploy?	2024	VLDB	7.1386386e-05
6,552	How do Categorical Duplicates Affect ML? A New Benchmark and Empirical Analyses	2024	VLDB	5.0109216e-05

Semantically Similar Papers

Overall Rank	Paper	Year	Venue	Pagerank
9,373	Falcon: Fair Active Learning using Multi-armed Bandits	2024	VLDB	4.3460825e-05
1,869	Interpretable Data-Based Explanations for Fairness Debugging	2022	SIGMOD	0.00010263235
9,684	How to Design Robust Algorithms using Noisy Comparison Oracle	2021	VLDB	4.3006524e-05
3,767	Cleaning Crowdsourced Labels Using Oracles for Statistical Classification	2019	VLDB	6.7748725e-05
4,761	Automated Feature Engineering for Algorithmic Fairness	2021	VLDB	5.9341687e-05
1,037	Interventional Fairness : Causal Database Repair for Algorithmic Fairness	2019	SIGMOD	0.00014514825
3,397	Automatic Data Repair: Are We Ready to Deploy?	2024	VLDB	7.1386386e-05
7,605	Causal Feature Selection for Algorithmic Fairness	2022	SIGMOD	4.6943015e-05
6,850	Through the Data Management Lens: Experimental Analysis and Evaluation of Fair Classification	2022	SIGMOD	4.9036077e-05
10,764	Stress-Testing ML Pipelines with Adversarial Data Corruption	2025	VLDB	4.1905499e-05