Semi-Supervised Data Cleaning with Raha and Baran
Summary: Raha and Baran are configuration-free semi-supervised systems for end-to-end error detection and correction that learn to combine an auto-generated pool of base detectors/correctors from ~20 labeled tuples via label propagation. They leverage transfer learning from prior cleaning tasks to speed up detection and improve correction effectiveness. (summarized by gpt-5-mini on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
Incoming Citations (Sorted by Pagerank)
Showing 7 of 7 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 9,348 | GIDCL: A Graph-Enhanced Interpretable Data Cleaning Framework with Large Language Models | 2024 | SIGMOD | 4.3526427e-05 |
| 9,389 | DataVinci: Learning Syntactic and Semantic String Repairs | 2025 | SIGMOD | 4.3441378e-05 |
| 9,560 | MTSClean: Efficient Constraint-based Cleaning for Multi-Dimensional Time Series Data | 2024 | VLDB | 4.3254416e-05 |
| 10,395 | User-Centric Property Graph Repairs | 2025 | SIGMOD | 4.1945683e-05 |
| 10,821 | Demonstrating Matelda for Multi-Table Error Detection | 2025 | VLDB | 4.1945683e-05 |
| 10,855 | bNDCRepair: Cleaning both Data Errors and Inaccurate Constraints on Numerical Sequential Data | 2025 | VLDB | 4.1945683e-05 |
| 11,137 | Generalizable Data Cleaning of Tabular Data in Latent Space | 2024 | VLDB | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 15 of 15 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 7,013 | Qualitative Data Cleaning | 2016 | VLDB | 4.8619024e-05 |
| 11,137 | Generalizable Data Cleaning of Tabular Data in Latent Space | 2024 | VLDB | 4.1945683e-05 |
| 3,396 | Automatic Data Repair: Are We Ready to Deploy? | 2024 | VLDB | 7.1455126e-05 |
| 6,280 | Self-supervised and Interpretable Data Cleaning with Sequence Generative Adversarial Networks | 2023 | VLDB | 5.1290457e-05 |
| 1,894 | Baran: Effective Error Correction via a Unified Context Representation and Transfer Learning | 2020 | VLDB | 0.0001018378 |
| 5,660 | Descriptive and Prescriptive Data Cleaning | 2014 | SIGMOD | 5.3847321e-05 |
| 13,232 | Data Cleaning in the Era of Data Science: Challenges and Opportunities | 2021 | CIDR | - |
| 1,627 | Data Cleaning: Overview and Emerging Challenges | 2016 | SIGMOD | 0.00011086905 |
| 10,512 | Auto-Test: Learning Semantic-Domain Constraints for Unsupervised Error Detection in Tables | 2025 | SIGMOD | 4.1945683e-05 |
| 2,968 | Raha: A Configuration-Free Error Detection System | 2019 | SIGMOD | 7.7985097e-05 |