Database Paper Browser

Back to papers

Automatic Data Repair: Are We Ready to Deploy?

Summary: Driver-information taxonomy plus empirical evaluation of 12 repair methods on 12 datasets across error rates/types and 4 downstream tasks using a new practical error-reduction metric. A unified repair-optimization boosts SOTA, shows repair consistently benefits downstream analyses, and provides deployment guidelines. (summarized by gpt-5-mini on Feb 09 2026)

Paper ID
13485
Venue
VLDB
Year
2024
Pagerank
7.1455126e-05
Overall Rank
3,396 | 76.38%
DOI
10.14778/3675034.3675051

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 8 of 8 citing papers.

Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 36 of 36 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
192 HoloClean: Holistic Data Repairs with Probabilistic Inference 2017 VLDB 0.00035728858
199 Declarative Data Cleaning: Language, Model, and Algorithms 2001 VLDB 0.00035041015
221 Deep Entity Matching with Pre-Trained Language Models 2021 VLDB 0.00033121824
300 Deep Learning for Entity Matching: A Design Space Exploration 2018 SIGMOD 0.00028441466
667 Incremental Knowledge Base Construction Using DeepDive 2015 VLDB 0.00018440557
702 Reasoning about Record Matching Rules 2009 VLDB 0.00017918203
754 Distributed Representations of Tuples for Entity Resolution 2018 VLDB 0.00017117211
791 ActiveClean: Interactive Data Cleaning For Statistical Modeling 2016 VLDB 0.00016629664
881 Don’t be SCAREd: Use SCalable Automatic REpairing with Maximal Likelihood and Bounded Changes 2013 SIGMOD 0.00015661103
936 Framework for Evaluating Clustering Algorithms in Duplicate Detection 2009 VLDB 0.0001521549
1,047 Functional Dependency Discovery: An Experimental Evaluation of Seven Algorithms 2015 VLDB 0.00014459715
1,159 Towards Certain Fixes with Editing Rules and Master Data 2010 VLDB 0.00013592813
1,197 The LLUNATIC Data-Cleaning Framework 2013 VLDB 0.00013390321
1,337 HoloDetect: Few-Shot Learning for Error Detection 2019 SIGMOD 0.00012497164
1,546 KATARA: A Data Cleaning System Powered by Knowledge Bases and Crowdsourcing 2015 SIGMOD 0.00011446851
1,612 Detecting Data Errors: Where are we and what needs to be done? 2016 VLDB 0.00011142794
1,627 Data Cleaning: Overview and Emerging Challenges 2016 SIGMOD 0.00011086905
1,894 Baran: Effective Error Correction via a Unified Context Representation and Transfer Learning 2020 VLDB 0.0001018378
1,935 A Data- and Workload-Aware Algorithm for Range Queries Under Differential Privacy 2014 VLDB 0.00010032967
2,158 Uni-Detect: A Unified Approach to Automated Error Detection in Tables 2019 SIGMOD 9.4141354e-05
2,253 Efficient Denial Constraint Discovery with Hydra 2018 VLDB 9.1937209e-05
2,483 Discovery of Approximate (and Exact) Denial Constraints 2020 VLDB 8.6864916e-05
2,506 Auto-Detect: Data-Driven Error Detection in Tables 2018 SIGMOD 8.6335464e-05
2,638 Messing Up with BART: Error Generation for Evaluating Data-Cleaning Algorithms 2016 VLDB 8.399764e-05
2,946 BigDansing: A System for Big Data Cleansing 2015 SIGMOD 7.8372441e-05
2,968 Raha: A Configuration-Free Error Detection System 2019 SIGMOD 7.7985097e-05
3,140 ZeroER: Entity Resolution using Zero Labeled Examples 2020 SIGMOD 7.4841763e-05
3,311 Efficient and Effective Data Imputation with Influence Functions 2022 VLDB 7.2406486e-05
3,861 Generating Concise Entity Matching Rules 2017 SIGMOD 6.6878164e-05
4,273 Cleaning Denial Constraint Violations through Relaxation 2020 SIGMOD 6.3003864e-05
4,607 Data Integration and Machine Learning: A Natural Synergy 2018 SIGMOD 6.0538827e-05
5,153 Horizon: Scalable Dependency-driven Data Cleaning 2021 VLDB 5.6607963e-05
6,350 NADEEF: A Generalized Data Cleaning System 2013 VLDB 5.101815e-05
6,711 Analyzing How BERT Performs Entity Matching 2022 VLDB 4.9517546e-05
9,077 VerifAI: Verified Generative AI 2024 CIDR 4.4010762e-05
9,369 Constraint-Variance Tolerant Data Repairing 2016 SIGMOD 4.3481081e-05
Previous Page 1 / 1 Next

Semantically Similar Papers

Overall Rank Paper Year Venue Pagerank
5,660 Descriptive and Prescriptive Data Cleaning 2014 SIGMOD 5.3847321e-05
2,823 Interaction between Record Matching and Data Repairing 2011 SIGMOD 8.0593894e-05
881 Don’t be SCAREd: Use SCalable Automatic REpairing with Maximal Likelihood and Bounded Changes 2013 SIGMOD 0.00015661103
9,369 Constraint-Variance Tolerant Data Repairing 2016 SIGMOD 4.3481081e-05
7,013 Qualitative Data Cleaning 2016 VLDB 4.8619024e-05
833 Guided Data Repair 2011 VLDB 0.00016138432
1,612 Detecting Data Errors: Where are we and what needs to be done? 2016 VLDB 0.00011142794
1,627 Data Cleaning: Overview and Emerging Challenges 2016 SIGMOD 0.00011086905
623 Improving Data Quality: Consistency and Accuracy 2007 VLDB 0.00018996374
3,192 Towards Dependable Data Repairing with Fixing Rules 2014 SIGMOD 7.4095761e-05