Database Paper Browser

Back to papers

Raha: A Configuration-Free Error Detection System

Summary: Raha is a configuration-free error detection system for data cleaning. It generates a compact set of configurations to form per-tuple feature vectors, then uses sampling and learning to select representative values, leveraging historical data to prune irrelevant detectors and outperform prior work with at most 20 labels. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
5765
Venue
SIGMOD
Year
2019
Pagerank
7.7985097e-05
Overall Rank
2,968 | 79.36%
DOI
10.1145/3299869.3324956

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 35 of 35 citing papers.

Rank Citing Paper Year Venue Pagerank
1,894 Baran: Effective Error Correction via a Unified Context Representation and Transfer Learning 2020 VLDB 0.0001018378
2,349 RPT: Relational Pre-trained Transformer Is Almost All You Need towards Democratizing Data Preparation 2021 VLDB 8.9876423e-05
2,753 Complaint-driven Training Data Debugging for Query 2.0 2020 SIGMOD 8.1724339e-05
3,299 SCODED: Statistical Constraint Oriented Data Error Detection 2020 SIGMOD 7.2546659e-05
3,396 Automatic Data Repair: Are We Ready to Deploy? 2024 VLDB 7.1455126e-05
5,028 Adaptive Data Augmentation for Supervised Learning over Missing Data 2021 VLDB 5.7506746e-05
5,096 Auto-Transform: Learning-to-Transform by Patterns 2020 VLDB 5.7011825e-05
5,222 Enabling SQL-based Training Data Debugging for Federated Learning 2022 VLDB 5.6210545e-05
5,941 Big Graphs: Challenges and Opportunities 2022 VLDB 5.2635446e-05
5,978 Rotom: A Meta-Learned Data Augmentation Framework for Entity Matching, Data Cleaning, Text Classification, and Beyond 2021 SIGMOD 5.2453012e-05
6,187 Semi-Supervised Data Cleaning with Raha and Baran 2021 CIDR 5.1656857e-05
6,280 Self-supervised and Interpretable Data Cleaning with Sequence Generative Adversarial Networks 2023 VLDB 5.1290457e-05
6,690 Parallel Discrepancy Detection and Incremental Detection 2021 VLDB 4.9621556e-05
7,838 Auto-Validate: Unsupervised Data Validation Using Data-Domain Patterns Inferred from Data Lakes 2021 SIGMOD 4.6377995e-05
8,092 Saga: A Scalable Framework for Optimizing Data Cleaning Pipelines for Machine Learning Applications 2023 SIGMOD 4.587921e-05
8,590 Exploratory Training: When Annotators Learn About Data 2023 SIGMOD 4.4896282e-05
8,745 Sparcle: Boosting the Accuracy of Data Cleaning Systems through Spatial Awareness 2024 VLDB 4.456315e-05
9,077 VerifAI: Verified Generative AI 2024 CIDR 4.4010762e-05
9,348 GIDCL: A Graph-Enhanced Interpretable Data Cleaning Framework with Large Language Models 2024 SIGMOD 4.3526427e-05
9,355 Discovering Top-k Rules using Subjective and Objective Criteria 2023 SIGMOD 4.3514328e-05
9,389 DataVinci: Learning Syntactic and Semantic String Repairs 2025 SIGMOD 4.3441378e-05
9,434 Rock: Cleaning Data by Embedding ML in Logic Rules 2024 SIGMOD 4.3430376e-05
9,460 The Battleship Approach to the Low Resource Entity Matching Problem 2023 SIGMOD 4.3366491e-05
9,487 Making It Tractable to Catch Duplicates and Conflicts in Graphs 2023 SIGMOD 4.3341665e-05
9,577 CoClean: Collaborative Data Cleaning 2020 SIGMOD 4.3248438e-05
9,777 Data Augmentation for ML-driven Data Preparation and Integration 2021 VLDB 4.2856106e-05
9,849 Reptile: Aggregation-level Explanations for Hierarchical Data 2022 SIGMOD 4.2721228e-05
10,026 Minimum Change ≠ Best Cleaning: Parallel and Incremental Error Detection under Integrity Constraints 2026 SIGMOD 4.1945683e-05
10,478 Data Enhancement for Binary Classification of Relational Data 2025 SIGMOD 4.1945683e-05
10,512 Auto-Test: Learning Semantic-Domain Constraints for Unsupervised Error Detection in Tables 2025 SIGMOD 4.1945683e-05
10,598 Auto-Prep: Holistic Prediction of Data Preparation Steps for Self-Service Business Intelligence 2025 VLDB 4.1945683e-05
10,723 UniClean: A Scalable Data Cleaning Solution for Mixed Errors based on Unified Cleaners and Optimized Cleaning Workflow 2025 VLDB 4.1945683e-05
10,821 Demonstrating Matelda for Multi-Table Error Detection 2025 VLDB 4.1945683e-05
11,111 Rock: Cleaning Data with both ML and Logic Rules 2024 VLDB 4.1945683e-05
11,223 Splitting Tuples of Mismatched Entities 2023 SIGMOD 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 13 of 13 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers