Raha: A Configuration-Free Error Detection System
Summary: Raha is a configuration-free error detection system for data cleaning. It generates a compact set of configurations to form per-tuple feature vectors, then uses sampling and learning to select representative values, leveraging historical data to prune irrelevant detectors and outperform prior work with at most 20 labels. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Mohammad Mahdavi
- 2. Ziawasch Abedjan
- 3. Raul Castro Fernandez
- 4. Samuel Madden
- 5. Mourad Ouzzani
- 6. Michael Stonebraker
- 7. Nan Tang
Incoming Citations (Sorted by Pagerank)
Showing 35 of 35 citing papers.
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 13 of 13 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 8,585 | Robust Entity Resolution using Random Graphs | 2018 | SIGMOD | 4.4905755e-05 |
| 3,396 | Automatic Data Repair: Are We Ready to Deploy? | 2024 | VLDB | 7.1455126e-05 |
| 1,894 | Baran: Effective Error Correction via a Unified Context Representation and Transfer Learning | 2020 | VLDB | 0.0001018378 |
| 3,976 | UGuide – User-Guided Discovery of FD-Detectable Errors | 2017 | SIGMOD | 6.5736462e-05 |
| 1,612 | Detecting Data Errors: Where are we and what needs to be done? | 2016 | VLDB | 0.00011142794 |
| 1,337 | HoloDetect: Few-Shot Learning for Error Detection | 2019 | SIGMOD | 0.00012497164 |
| 3,105 | Data X-Ray: A Diagnostic Tool for Data Errors | 2015 | SIGMOD | 7.5568954e-05 |
| 2,506 | Auto-Detect: Data-Driven Error Detection in Tables | 2018 | SIGMOD | 8.6335464e-05 |
| 10,512 | Auto-Test: Learning Semantic-Domain Constraints for Unsupervised Error Detection in Tables | 2025 | SIGMOD | 4.1945683e-05 |
| 6,187 | Semi-Supervised Data Cleaning with Raha and Baran | 2021 | CIDR | 5.1656857e-05 |