Back to papers
MisDetect: Iterative Mislabel Detection using Early Loss
Summary: MisDetect identifies label noise during training by iteratively flagging high early-loss examples, applying influence-based verification, and auto-stopping when early-loss signals fade. For ambiguous instances it generates pseudo-labels to train a binary verifier; outperforms 10 baselines on 15 datasets.
(summarized by gpt-5-mini on Feb 09 2026)
- Paper ID
- 13364
- Venue
- VLDB
- Year
- 2024
- Pagerank
- 4.1945683e-05
- Overall Rank
- 11,000 | 23.48%
- DOI
-
10.14778/3648160.3648161
Incoming Non-self Citations Over Time
No non-self incoming citations found for this paper in this database.
Incoming Citations (Sorted by Pagerank)
Showing 4 of 4 citing papers.
Outgoing Citations (Sorted by Pagerank)
Showing 13 of 13 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 192 |
HoloClean: Holistic Data Repairs with Probabilistic Inference |
2017 |
VLDB |
0.00035728858 |
| 1,337 |
HoloDetect: Few-Shot Learning for Error Detection |
2019 |
SIGMOD |
0.00012497164 |
| 1,867 |
Interpretable Data-Based Explanations for Fairness Debugging |
2022 |
SIGMOD |
0.00010272055 |
| 2,753 |
Complaint-driven Training Data Debugging for Query 2.0 |
2020 |
SIGMOD |
8.1724339e-05 |
| 3,773 |
Cleaning Crowdsourced Labels Using Oracles for Statistical Classification |
2019 |
VLDB |
6.7758649e-05 |
| 4,102 |
GoodCore: Data-effective and Data-efficient Machine Learning through Coreset Selection over Incomplete Data |
2023 |
SIGMOD |
6.4522929e-05 |
| 5,279 |
CDB: A Crowd-Powered Database System |
2018 |
VLDB |
5.5902418e-05 |
| 5,362 |
Cost-Effective Crowdsourced Entity Resolution: A Partial-Order Approach |
2016 |
SIGMOD |
5.5473503e-05 |
| 5,381 |
Selective Data Acquisition in the Wild for Model Charging |
2022 |
VLDB |
5.5399508e-05 |
| 7,179 |
Coresets over Multiple Tables for Feature-rich and Data-efficient Machine Learning |
2023 |
VLDB |
4.8078895e-05 |
| 7,575 |
Human-in-the-loop Outlier Detection |
2020 |
SIGMOD |
4.7068909e-05 |
| 7,796 |
CHEF: A Cheap and Fast Pipeline for Iteratively Cleaning Label Uncertainties |
2021 |
VLDB |
4.6482625e-05 |
| 9,221 |
VisClean: Interactive Cleaning for Progressive Visualization |
2020 |
VLDB |
4.3699444e-05 |
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 1,337 |
HoloDetect: Few-Shot Learning for Error Detection |
2019 |
SIGMOD |
0.00012497164 |
| 11,052 |
Efficiently Mitigating the Impact of Data Drift on Machine Learning Pipelines |
2024 |
VLDB |
4.1945683e-05 |
| 8,714 |
LANCET: Labeling Complex Data at Scale |
2021 |
VLDB |
4.4619818e-05 |
| 10,953 |
Certain and Approximately Certain Models for Statistical Learning |
2024 |
SIGMOD |
4.1945683e-05 |
| 10,478 |
Data Enhancement for Binary Classification of Relational Data |
2025 |
SIGMOD |
4.1945683e-05 |
| 8,590 |
Exploratory Training: When Annotators Learn About Data |
2023 |
SIGMOD |
4.4896282e-05 |
| 6,134 |
Finding Label and Model Errors in Perception Data With Learned Observation Assertions |
2022 |
SIGMOD |
5.1943414e-05 |
| 9,896 |
Towards Interpretable and Learnable Risk Analysis for Entity Resolution |
2020 |
SIGMOD |
4.2600049e-05 |
| 4,110 |
Learning to Validate the Predictions of Black Box Classifiers on Unseen Data |
2020 |
SIGMOD |
6.4428544e-05 |
| 10,528 |
Two Birds with One Stone: Efficient Deep Learning over Mislabeled Data through Subset Selection |
2025 |
SIGMOD |
4.1945683e-05 |