Database Paper Browser

Back to papers

Improving Data Quality: Consistency and Accuracy

Summary: CFDs model data consistency beyond traditional FDs to detect inconsistencies. Two heuristic repairs (static and incremental) for CFDs are NP-hard; experiments show effectiveness, with a statistical accuracy guarantee above a predefined rate and minimal user effort. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
9586
Venue
VLDB
Year
2007
Pagerank
0.00018996374
Overall Rank
623 | 95.67%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 44 of 44 citing papers.

Rank Citing Paper Year Venue Pagerank
192 HoloClean: Holistic Data Repairs with Probabilistic Inference 2017 VLDB 0.00035728858
560 Dependencies Revisited for Improving Data Quality 2008 PODS 0.00020141923
656 ERACER: A Database Approach for Statistical Inference and Data Cleaning 2010 SIGMOD 0.00018588729
732 Discovering Data Quality Rules 2008 VLDB 0.00017465093
833 Guided Data Repair 2011 VLDB 0.00016138432
881 Don’t be SCAREd: Use SCalable Automatic REpairing with Maximal Likelihood and Bounded Changes 2013 SIGMOD 0.00015661103
1,012 NADEEF: A Commodity Data Cleaning System 2013 SIGMOD 0.0001464733
1,159 Towards Certain Fixes with Editing Rules and Master Data 2010 VLDB 0.00013592813
1,188 On Generating Near-Optimal Tableaux for Conditional Functional Dependencies 2008 VLDB 0.00013441729
1,197 The LLUNATIC Data-Cleaning Framework 2013 VLDB 0.00013390321
1,546 KATARA: A Data Cleaning System Powered by Knowledge Bases and Crowdsourcing 2015 SIGMOD 0.00011446851
1,627 Data Cleaning: Overview and Emerging Challenges 2016 SIGMOD 0.00011086905
2,158 Uni-Detect: A Unified Approach to Automated Error Detection in Tables 2019 SIGMOD 9.4141354e-05
2,266 Estimating the Confidence of Conditional Functional Dependencies 2009 SIGMOD 9.1540815e-05
2,379 A Revival of Integrity Constraints for Data Cleaning 2008 VLDB 8.9392633e-05
2,638 Messing Up with BART: Error Generation for Evaluating Data-Cleaning Algorithms 2016 VLDB 8.399764e-05
2,823 Interaction between Record Matching and Data Repairing 2011 SIGMOD 8.0593894e-05
3,105 Data X-Ray: A Diagnostic Tool for Data Errors 2015 SIGMOD 7.5568954e-05
3,192 Towards Dependable Data Repairing with Fixing Rules 2014 SIGMOD 7.4095761e-05
3,713 GDR: A System for Guided Data Repair 2010 SIGMOD 6.8224341e-05
4,744 Effective and Complete Discovery of Order Dependencies via Set-based Axiomatization 2017 VLDB 5.957936e-05
5,096 Auto-Transform: Learning-to-Transform by Patterns 2020 VLDB 5.7011825e-05
5,153 Horizon: Scalable Dependency-driven Data Cleaning 2021 VLDB 5.6607963e-05
5,445 QFix: Diagnosing Errors through Query Histories 2017 SIGMOD 5.5020909e-05
5,803 Semandaq: A Data Quality System Based on Conditional Functional Dependencies 2008 VLDB 5.3205861e-05
6,280 Self-supervised and Interpretable Data Cleaning with Sequence Generative Adversarial Networks 2023 VLDB 5.1290457e-05
6,546 Properties of Inconsistency Measures for Databases 2021 SIGMOD 5.0185588e-05
6,690 Parallel Discrepancy Detection and Incremental Detection 2021 VLDB 4.9621556e-05
6,705 Consistent Query Answers in Inconsistent Probabilistic Databases 2010 SIGMOD 4.9549359e-05
7,013 Qualitative Data Cleaning 2016 VLDB 4.8619024e-05
7,605 The Computation of Optimal Subset Repairs 2020 VLDB 4.697534e-05
7,766 ICARUS: Minimizing Human Effort in Iterative Data Completion 2018 VLDB 4.6564959e-05
7,867 Learning Over Dirty Data Without Cleaning 2020 SIGMOD 4.6320452e-05
8,875 CerFix: A System for Cleaning Data with Certain Fixes 2011 VLDB 4.430475e-05
9,056 A Data Quality Metric (DQM): How to Estimate the Number of Undetected Errors in Data Sets 2017 VLDB 4.4039656e-05
9,348 GIDCL: A Graph-Enhanced Interpretable Data Cleaning Framework with Large Language Models 2024 SIGMOD 4.3526427e-05
9,434 Rock: Cleaning Data by Embedding ML in Logic Rules 2024 SIGMOD 4.3430376e-05
9,478 Incremental Detection of Denial Constraint Violations 2025 VLDB 4.3341665e-05
9,487 Making It Tractable to Catch Duplicates and Conflicts in Graphs 2023 SIGMOD 4.3341665e-05
10,867 T-Assess: An Efficient Data Quality Assessment System Tailored for Trajectory Data 2025 VLDB 4.1945683e-05
11,069 Hardware-Efficient Data Imputation through DBMS Extensibility 2024 VLDB 4.1945683e-05
11,223 Splitting Tuples of Mismatched Entities 2023 SIGMOD 4.1945683e-05
11,454 Contextual Data Cleaning with Ontology FDs 2021 SIGMOD 4.1945683e-05
11,841 BART in Action: Error Generation and Empirical Evaluations of Data-Cleaning Systems 2016 SIGMOD 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 6 of 6 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers