Database Paper Browser

Back to papers

Combining Quantitative and Logical Data Cleaning

Summary: Synergizes quantitative data cleaning (statistical distortion via EMD) with logical data cleaning using metric functional dependencies to detect and repair quality problems. Key results: linear-time MFD inference; NP-hardness of distortion-minimal repairs; efficient set-minimal repair with axiomatization; empirical distortion gains. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
11322
Venue
VLDB
Year
2016
Pagerank
8.7617484e-05
Overall Rank
2,460 | 82.89%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 19 of 19 citing papers.

Rank Citing Paper Year Venue Pagerank
1,337 HoloDetect: Few-Shot Learning for Error Detection 2019 SIGMOD 0.00012497164
1,612 Detecting Data Errors: Where are we and what needs to be done? 2016 VLDB 0.00011142794
1,894 Baran: Effective Error Correction via a Unified Context Representation and Transfer Learning 2020 VLDB 0.0001018378
2,349 RPT: Relational Pre-trained Transformer Is Almost All You Need towards Democratizing Data Preparation 2021 VLDB 8.9876423e-05
6,280 Self-supervised and Interpretable Data Cleaning with Sequence Generative Adversarial Networks 2023 VLDB 5.1290457e-05
6,475 Explain3D: Explaining Disagreements in Disjoint Datasets 2019 VLDB 5.0497183e-05
7,066 On Multiple Semantics for Declarative Database Repairs 2020 SIGMOD 4.8445108e-05
8,005 Online Topic-Aware Entity Resolution Over Incomplete Data Streams 2021 SIGMOD 4.6081461e-05
8,422 Deducing Certain Fixes to Graphs 2019 VLDB 4.5167705e-05
8,745 Sparcle: Boosting the Accuracy of Data Cleaning Systems through Spatial Awareness 2024 VLDB 4.456315e-05
9,118 Towards Observability for Production Machine Learning Pipelines 2022 VLDB 4.3928288e-05
9,564 Catching Numeric Inconsistencies in Graphs 2018 SIGMOD 4.3254416e-05
9,749 Efficient Differential Dependency Discovery 2024 VLDB 4.2897489e-05
10,019 Guardrail: Automated Integrity Constraint Synthesis From Noisy Data 2026 SIGMOD 4.1945683e-05
10,706 Extensible and Robust Evaluation of Similarity Queries 2025 VLDB 4.1945683e-05
11,178 LinCQA: Faster Consistent Query Answering with Linear Time Guarantees 2023 SIGMOD 4.1945683e-05
11,454 Contextual Data Cleaning with Ontology FDs 2021 SIGMOD 4.1945683e-05
11,538 Quality of Sentiment Analysis Tools: The Reasons of Inconsistency 2021 VLDB 4.1945683e-05
11,841 BART in Action: Error Generation and Empirical Evaluations of Data-Cleaning Systems 2016 SIGMOD 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 11 of 11 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers