Database Paper Browser

Back to papers

UniClean: A Scalable Data Cleaning Solution for Mixed Errors based on Unified Cleaners and Optimized Cleaning Workflow

Summary: UniClean introduces a unified, extensible cleaner construction and three preparation optimizations plus an optimized cleaning workflow to handle mixed errors at scale. Its complexity O(|D_error|^4 * |Op| + |D|*|D_error|) yields >30–40% quality/runtime gains vs five SOTA methods, finishing millions of records in hours. (summarized by gpt-5-mini on Feb 09 2026)

Paper ID
14030
Venue
VLDB
Year
2025
Pagerank
4.1945683e-05
Overall Rank
10,723 | 25.41%
DOI
10.14778/3749466.3749681

Incoming Non-self Citations Over Time

No non-self incoming citations found for this paper in this database.

Authors

Incoming Citations (Sorted by Pagerank)

Showing 1 of 1 citing papers.

Rank Citing Paper Year Venue Pagerank
10,855 bNDCRepair: Cleaning both Data Errors and Inaccurate Constraints on Numerical Sequential Data 2025 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 27 of 27 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
192 HoloClean: Holistic Data Repairs with Probabilistic Inference 2017 VLDB 0.00035728858
221 Deep Entity Matching with Pre-Trained Language Models 2021 VLDB 0.00033121824
555 Discovering Denial Constraints 2013 VLDB 0.00020254908
702 Reasoning about Record Matching Rules 2009 VLDB 0.00017918203
754 Distributed Representations of Tuples for Entity Resolution 2018 VLDB 0.00017117211
881 Don’t be SCAREd: Use SCalable Automatic REpairing with Maximal Likelihood and Bounded Changes 2013 SIGMOD 0.00015661103
1,012 NADEEF: A Commodity Data Cleaning System 2013 SIGMOD 0.0001464733
1,337 HoloDetect: Few-Shot Learning for Error Detection 2019 SIGMOD 0.00012497164
1,612 Detecting Data Errors: Where are we and what needs to be done? 2016 VLDB 0.00011142794
1,894 Baran: Effective Error Correction via a Unified Context Representation and Transfer Learning 2020 VLDB 0.0001018378
2,638 Messing Up with BART: Error Generation for Evaluating Data-Cleaning Algorithms 2016 VLDB 8.399764e-05
2,946 BigDansing: A System for Big Data Cleansing 2015 SIGMOD 7.8372441e-05
2,968 Raha: A Configuration-Free Error Detection System 2019 SIGMOD 7.7985097e-05
3,396 Automatic Data Repair: Are We Ready to Deploy? 2024 VLDB 7.1455126e-05
3,825 Cleanits: A Data Cleaning System for Industrial Time Series 2019 VLDB 6.7255837e-05
4,273 Cleaning Denial Constraint Violations through Relaxation 2020 SIGMOD 6.3003864e-05
5,153 Horizon: Scalable Dependency-driven Data Cleaning 2021 VLDB 5.6607963e-05
6,261 The Cosmos Big Data Platform at Microsoft: Over a Decade of Progress and a Decade to Look Forward 2021 VLDB 5.1350714e-05
6,583 SCREEN: Stream Data Cleaning under Speed Constraints 2015 SIGMOD 5.0027988e-05
7,013 Qualitative Data Cleaning 2016 VLDB 4.8619024e-05
7,926 CoCo: Interactive Exploration of Conformance Constraints for Data Understanding and Data Cleaning 2021 SIGMOD 4.6144554e-05
9,221 VisClean: Interactive Cleaning for Progressive Visualization 2020 VLDB 4.3699444e-05
9,558 Clean4TSDB: A Data Cleaning Tool for Time Series Databases 2024 VLDB 4.3254416e-05
9,560 MTSClean: Efficient Constraint-based Cleaning for Multi-Dimensional Time Series Data 2024 VLDB 4.3254416e-05
9,649 DAFDiscover: Robust Mining Algorithm for Dynamic Approximate Functional Dependencies on Dirty Data 2024 VLDB 4.3109001e-05
9,760 Adaptive data transformations for QaaS 2025 CIDR 4.2856106e-05
9,771 EasyDR: A Human-in-the-loop Error Detection and Repair Platform for Holistic Table Cleaning 2022 VLDB 4.2856106e-05
Previous Page 1 / 1 Next

Semantically Similar Papers