Database Paper Browser

Back to papers

NADEEF: A Commodity Data Cleaning System

Summary: NADEEF: commodity end-to-end data cleaning platform with a programmable rule interface and a core for detection and repair. Extends beyond CFDs/MDs/ETL; core cleans holistically with two repair-algorithm implementations; validated on real data. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
4734
Venue
SIGMOD
Year
2013
Pagerank
0.0001464733
Overall Rank
1,012 | 92.97%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 50 of 53 citing papers.

Rank Citing Paper Year Venue Pagerank
192 HoloClean: Holistic Data Repairs with Probabilistic Inference 2017 VLDB 0.00035728858
517 Can Foundation Models Wrangle Your Data? 2023 VLDB 0.00021169035
712 Magellan: Toward Building Entity Matching Management Systems 2016 VLDB 0.00017732426
1,277 The Data Civilizer System 2017 CIDR 0.00012879695
1,337 HoloDetect: Few-Shot Learning for Error Detection 2019 SIGMOD 0.00012497164
1,546 KATARA: A Data Cleaning System Powered by Knowledge Bases and Crowdsourcing 2015 SIGMOD 0.00011446851
1,612 Detecting Data Errors: Where are we and what needs to be done? 2016 VLDB 0.00011142794
1,627 Data Cleaning: Overview and Emerging Challenges 2016 SIGMOD 0.00011086905
1,894 Baran: Effective Error Correction via a Unified Context Representation and Transfer Learning 2020 VLDB 0.0001018378
2,184 A Sample-and-Clean Framework for Fast and Accurate Query Processing on Dirty Data 2014 SIGMOD 9.3429789e-05
2,638 Messing Up with BART: Error Generation for Evaluating Data-Cleaning Algorithms 2016 VLDB 8.399764e-05
2,946 BigDansing: A System for Big Data Cleansing 2015 SIGMOD 7.8372441e-05
2,968 Raha: A Configuration-Free Error Detection System 2019 SIGMOD 7.7985097e-05
3,192 Towards Dependable Data Repairing with Fixing Rules 2014 SIGMOD 7.4095761e-05
3,265 RHEEM: Enabling Cross-Platform Data Processing - May The Big Data Be With You! - 2018 VLDB 7.3083672e-05
3,582 NADEEF/ER: Generic and Interactive Entity Resolution 2014 SIGMOD 6.9479263e-05
3,976 UGuide – User-Guided Discovery of FD-Detectable Errors 2017 SIGMOD 6.5736462e-05
4,127 A Statistical Perspective on Discovering Functional Dependencies in Noisy Data 2020 SIGMOD 6.4310458e-05
4,273 Cleaning Denial Constraint Violations through Relaxation 2020 SIGMOD 6.3003864e-05
4,464 Magellan: Toward Building Entity Matching Management Systems over Data Science Stacks 2016 VLDB 6.1606042e-05
4,904 Temporal Rules Discovery for Web Data Cleaning 2016 VLDB 5.8399195e-05
5,153 Horizon: Scalable Dependency-driven Data Cleaning 2021 VLDB 5.6607963e-05
5,192 Pattern Functional Dependencies for Data Cleaning 2020 VLDB 5.6375087e-05
5,382 That's All Folks! LLUNATIC Goes Open Source 2014 VLDB 5.5397633e-05
5,445 QFix: Diagnosing Errors through Query Histories 2017 SIGMOD 5.5020909e-05
5,506 Exploring Change – A New Dimension of Data Analytics 2019 VLDB 5.473324e-05
5,586 QuERy: A Framework for Integrating Entity Resolution with Query Processing 2016 VLDB 5.4219548e-05
5,618 Explaining Repaired Data with CFDs 2018 VLDB 5.4079415e-05
5,660 Descriptive and Prescriptive Data Cleaning 2014 SIGMOD 5.3847321e-05
6,187 Semi-Supervised Data Cleaning with Raha and Baran 2021 CIDR 5.1656857e-05
6,280 Self-supervised and Interpretable Data Cleaning with Sequence Generative Adversarial Networks 2023 VLDB 5.1290457e-05
6,350 NADEEF: A Generalized Data Cleaning System 2013 VLDB 5.101815e-05
6,739 Benchmarking Approximate Consistent Query Answering 2021 PODS 4.9449088e-05
7,013 Qualitative Data Cleaning 2016 VLDB 4.8619024e-05
7,237 CleanM: An Optimizable Query Language for Unified Scale-Out Data Cleaning 2017 VLDB 4.7928651e-05
7,605 The Computation of Optimal Subset Repairs 2020 VLDB 4.697534e-05
7,766 ICARUS: Minimizing Human Effort in Iterative Data Completion 2018 VLDB 4.6564959e-05
8,092 Saga: A Scalable Framework for Optimizing Data Cleaning Pipelines for Machine Learning Applications 2023 SIGMOD 4.587921e-05
8,593 Wisteria: Nurturing Scalable Data Cleaning Infrastructure 2015 VLDB 4.4891474e-05
8,745 Sparcle: Boosting the Accuracy of Data Cleaning Systems through Spatial Awareness 2024 VLDB 4.456315e-05
8,836 Fast Approximate Denial Constraint Discovery 2023 VLDB 4.4393184e-05
8,840 The Cost of Representation by Subset Repairs 2025 VLDB 4.4388652e-05
9,054 Selecting Data to Clean for Fact Checking: Minimizing Uncertainty vs. Maximizing Surprise 2019 VLDB 4.4039656e-05
9,278 Interactive and Deterministic Data Cleaning: A Tossed Stone Raises a Thousand Ripples 2016 SIGMOD 4.3639892e-05
9,749 Efficient Differential Dependency Discovery 2024 VLDB 4.2897489e-05
9,810 Rheem: Enabling Multi-Platform Task Execution 2016 SIGMOD 4.278405e-05
10,216 The Case For Language Model Approximated LIKE Predicate 2026 SIGMOD 4.1945683e-05
10,610 Weak-to-Strong Prompts with Lightweight-to-Powerful LLMs for High-Accuracy, Low-Cost, and Explainable Data Transformation 2025 VLDB 4.1945683e-05
10,676 Meaningful Data Erasure in the Presence of Dependencies 2025 VLDB 4.1945683e-05
10,723 UniClean: A Scalable Data Cleaning Solution for Mixed Errors based on Unified Cleaners and Optimized Cleaning Workflow 2025 VLDB 4.1945683e-05
Previous Page 1 / 2 Next

Outgoing Citations (Sorted by Pagerank)

Showing 10 of 10 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers