Database Paper Browser

Back to papers

HoloClean: Holistic Data Repairs with Probabilistic Inference

Summary: HoloClean couples constraint-driven and statistical data repair via automatic probabilistic-program generation from dirty data. Scalable inference over millions of tuples; precision ~90%, recall ~76%, F1 >2x vs state-of-the-art. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
11404
Venue
VLDB
Year
2017
Pagerank
0.00035728858
Overall Rank
192 | 98.67%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 33 of 133 citing papers.

Rank Citing Paper Year Venue Pagerank
10,026 Minimum Change ≠ Best Cleaning: Parallel and Incremental Error Detection under Integrity Constraints 2026 SIGMOD 4.1945683e-05
10,061 Cleaning Time Series under Seasonal and Trend Constraints 2026 SIGMOD 4.1945683e-05
10,140 Analyzing Deviations from Monotonic Trends through Database Repair 2026 SIGMOD 4.1945683e-05
10,211 SHoTClean: Bridging Soft and Hard Constraints for Multivariate Time Series Cleaning 2026 SIGMOD 4.1945683e-05
10,377 FastPDB: Towards Bag-Probabilistic Queries at Interactive Speeds 2025 SIGMOD 4.1945683e-05
10,395 User-Centric Property Graph Repairs 2025 SIGMOD 4.1945683e-05
10,489 Incremental Rule Discovery in Response to Parameter Updates 2025 SIGMOD 4.1945683e-05
10,512 Auto-Test: Learning Semantic-Domain Constraints for Unsupervised Error Detection in Tables 2025 SIGMOD 4.1945683e-05
10,513 Computing Inconsistency Measures Under Differential Privacy 2025 SIGMOD 4.1945683e-05
10,560 A Systematic Study on Early Stopping Metrics in HPO and the Implications of Uncertainty 2025 VLDB 4.1945683e-05
10,676 Meaningful Data Erasure in the Presence of Dependencies 2025 VLDB 4.1945683e-05
10,679 How and Why False Denial Constraints are Discovered 2025 VLDB 4.1945683e-05
10,723 UniClean: A Scalable Data Cleaning Solution for Mixed Errors based on Unified Cleaners and Optimized Cleaning Workflow 2025 VLDB 4.1945683e-05
10,744 DIM-SUM: Dynamic IMputation for Smart Utility Management 2025 VLDB 4.1945683e-05
10,811 DemandClean: A Multi-Objective Learning Framework for Balancing Model Tolerance to Data Authenticity and Diversity 2025 VLDB 4.1945683e-05
10,838 New Trends in Data Forgetting for Sustainable Data Management 2025 VLDB 4.1945683e-05
10,855 bNDCRepair: Cleaning both Data Errors and Inaccurate Constraints on Numerical Sequential Data 2025 VLDB 4.1945683e-05
11,000 MisDetect: Iterative Mislabel Detection using Early Loss 2024 VLDB 4.1945683e-05
11,050 Win-Win: On Simultaneous Clustering and Imputing over Incomplete Data 2024 VLDB 4.1945683e-05
11,054 Enriching Relations with Additional Attributes for ER 2024 VLDB 4.1945683e-05
11,069 Hardware-Efficient Data Imputation through DBMS Extensibility 2024 VLDB 4.1945683e-05
11,111 Rock: Cleaning Data with both ML and Logic Rules 2024 VLDB 4.1945683e-05
11,137 Generalizable Data Cleaning of Tabular Data in Latent Space 2024 VLDB 4.1945683e-05
11,178 LinCQA: Faster Consistent Query Answering with Linear Time Guarantees 2023 SIGMOD 4.1945683e-05
11,187 Regularized Pairwise Relationship based Analytics for Structured Data 2023 SIGMOD 4.1945683e-05
11,223 Splitting Tuples of Mismatched Entities 2023 SIGMOD 4.1945683e-05
11,399 ActivePDB: Active Probabilistic Databases 2022 VLDB 4.1945683e-05
11,409 Machine Programming: Turning Data into Programmer Productivity 2022 VLDB 4.1945683e-05
11,431 Ease.ML: A Lifecycle Management System for MLDev and MLOps 2021 CIDR 4.1945683e-05
11,515 From Papers to Practice: The openclean Open-Source Data Cleaning Library 2021 VLDB 4.1945683e-05
11,543 Migrating a Privacy-Safe Information Extraction System to a Software 2.0 Design 2020 CIDR 4.1945683e-05
11,584 T-REx: Table Repair Explanations 2020 SIGMOD 4.1945683e-05
11,682 IHCS: An Integrated Hybrid Cleaning System 2019 VLDB 4.1945683e-05
Previous Page 3 / 3 Next

Outgoing Citations (Sorted by Pagerank)

Showing 23 of 23 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
265 A Cost-Based Model and Effective Heuristic for Repairing Constraints by Value Modification 2005 SIGMOD 0.00029763412
322 Record Linkage: Similarity Measures and Algorithms 2006 SIGMOD 0.00027518768
489 Data Curation at Scale: The Data Tamer System 2013 CIDR 0.00022030728
555 Discovering Denial Constraints 2013 VLDB 0.00020254908
560 Dependencies Revisited for Improving Data Quality 2008 PODS 0.00020141923
623 Improving Data Quality: Consistency and Accuracy 2007 VLDB 0.00018996374
656 ERACER: A Database Approach for Statistical Inference and Data Cleaning 2010 SIGMOD 0.00018588729
667 Incremental Knowledge Base Construction Using DeepDive 2015 VLDB 0.00018440557
702 Reasoning about Record Matching Rules 2009 VLDB 0.00017918203
814 Entity Resolution: Theory, Practice & Open Challenges 2012 VLDB 0.00016370594
881 Don’t be SCAREd: Use SCalable Automatic REpairing with Maximal Likelihood and Bounded Changes 2013 SIGMOD 0.00015661103
1,012 NADEEF: A Commodity Data Cleaning System 2013 SIGMOD 0.0001464733
1,014 Tuffy: Scaling up Statistical Inference in Markov Logic Networks using an RDBMS 2011 VLDB 0.00014640258
1,044 DimmWitted: A Study of Main-Memory Statistical Analytics 2014 VLDB 0.00014475229
1,159 Towards Certain Fixes with Editing Rules and Master Data 2010 VLDB 0.00013592813
1,197 The LLUNATIC Data-Cleaning Framework 2013 VLDB 0.00013390321
1,211 Truth Finding on the Deep Web: Is the Problem Solved? 2013 VLDB 0.00013257101
1,546 KATARA: A Data Cleaning System Powered by Knowledge Bases and Crowdsourcing 2015 SIGMOD 0.00011446851
1,612 Detecting Data Errors: Where are we and what needs to be done? 2016 VLDB 0.00011142794
1,624 Sampling the Repairs of Functional Dependency Violations under Hard Constraints 2010 VLDB 0.00011099222
3,042 Dichotomies in the Complexity of Preferred Repairs 2015 PODS 7.669374e-05
3,192 Towards Dependable Data Repairing with Fixing Rules 2014 SIGMOD 7.4095761e-05
3,897 SLiMFast: Guaranteed Results for Data Fusion and Source Reliability 2017 SIGMOD 6.6554845e-05
Previous Page 1 / 1 Next

Semantically Similar Papers