Back to papers
UniClean: A Scalable Data Cleaning Solution for Mixed Errors based on Unified Cleaners and Optimized Cleaning Workflow
Summary: UniClean introduces a unified, extensible cleaner construction and three preparation optimizations plus an optimized cleaning workflow to handle mixed errors at scale. Its complexity O(|D_error|^4 * |Op| + |D|*|D_error|) yields >30–40% quality/runtime gains vs five SOTA methods, finishing millions of records in hours.
(summarized by gpt-5-mini on Feb 09 2026)
- Paper ID
- 14030
- Venue
- VLDB
- Year
- 2025
- Pagerank
- 4.1945683e-05
- Overall Rank
- 10,723 | 25.41%
- DOI
-
10.14778/3749466.3749681
Incoming Non-self Citations Over Time
No non-self incoming citations found for this paper in this database.
Incoming Citations (Sorted by Pagerank)
Showing 1 of 1 citing papers.
Outgoing Citations (Sorted by Pagerank)
Showing 27 of 27 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 192 |
HoloClean: Holistic Data Repairs with Probabilistic Inference |
2017 |
VLDB |
0.00035728858 |
| 221 |
Deep Entity Matching with Pre-Trained Language Models |
2021 |
VLDB |
0.00033121824 |
| 555 |
Discovering Denial Constraints |
2013 |
VLDB |
0.00020254908 |
| 702 |
Reasoning about Record Matching Rules |
2009 |
VLDB |
0.00017918203 |
| 754 |
Distributed Representations of Tuples for Entity Resolution |
2018 |
VLDB |
0.00017117211 |
| 881 |
Don’t be SCAREd: Use SCalable Automatic REpairing with Maximal Likelihood and Bounded Changes |
2013 |
SIGMOD |
0.00015661103 |
| 1,012 |
NADEEF: A Commodity Data Cleaning System |
2013 |
SIGMOD |
0.0001464733 |
| 1,337 |
HoloDetect: Few-Shot Learning for Error Detection |
2019 |
SIGMOD |
0.00012497164 |
| 1,612 |
Detecting Data Errors: Where are we and what needs to be done? |
2016 |
VLDB |
0.00011142794 |
| 1,894 |
Baran: Effective Error Correction via a Unified Context Representation and Transfer Learning |
2020 |
VLDB |
0.0001018378 |
| 2,638 |
Messing Up with BART: Error Generation for Evaluating Data-Cleaning Algorithms |
2016 |
VLDB |
8.399764e-05 |
| 2,946 |
BigDansing: A System for Big Data Cleansing |
2015 |
SIGMOD |
7.8372441e-05 |
| 2,968 |
Raha: A Configuration-Free Error Detection System |
2019 |
SIGMOD |
7.7985097e-05 |
| 3,396 |
Automatic Data Repair: Are We Ready to Deploy? |
2024 |
VLDB |
7.1455126e-05 |
| 3,825 |
Cleanits: A Data Cleaning System for Industrial Time Series |
2019 |
VLDB |
6.7255837e-05 |
| 4,273 |
Cleaning Denial Constraint Violations through Relaxation |
2020 |
SIGMOD |
6.3003864e-05 |
| 5,153 |
Horizon: Scalable Dependency-driven Data Cleaning |
2021 |
VLDB |
5.6607963e-05 |
| 6,261 |
The Cosmos Big Data Platform at Microsoft: Over a Decade of Progress and a Decade to Look Forward |
2021 |
VLDB |
5.1350714e-05 |
| 6,583 |
SCREEN: Stream Data Cleaning under Speed Constraints |
2015 |
SIGMOD |
5.0027988e-05 |
| 7,013 |
Qualitative Data Cleaning |
2016 |
VLDB |
4.8619024e-05 |
| 7,926 |
CoCo: Interactive Exploration of Conformance Constraints for Data Understanding and Data Cleaning |
2021 |
SIGMOD |
4.6144554e-05 |
| 9,221 |
VisClean: Interactive Cleaning for Progressive Visualization |
2020 |
VLDB |
4.3699444e-05 |
| 9,558 |
Clean4TSDB: A Data Cleaning Tool for Time Series Databases |
2024 |
VLDB |
4.3254416e-05 |
| 9,560 |
MTSClean: Efficient Constraint-based Cleaning for Multi-Dimensional Time Series Data |
2024 |
VLDB |
4.3254416e-05 |
| 9,649 |
DAFDiscover: Robust Mining Algorithm for Dynamic Approximate Functional Dependencies on Dirty Data |
2024 |
VLDB |
4.3109001e-05 |
| 9,760 |
Adaptive data transformations for QaaS |
2025 |
CIDR |
4.2856106e-05 |
| 9,771 |
EasyDR: A Human-in-the-loop Error Detection and Repair Platform for Holistic Table Cleaning |
2022 |
VLDB |
4.2856106e-05 |
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 7,013 |
Qualitative Data Cleaning |
2016 |
VLDB |
4.8619024e-05 |
| 7,867 |
Learning Over Dirty Data Without Cleaning |
2020 |
SIGMOD |
4.6320452e-05 |
| 2,946 |
BigDansing: A System for Big Data Cleansing |
2015 |
SIGMOD |
7.8372441e-05 |
| 5,660 |
Descriptive and Prescriptive Data Cleaning |
2014 |
SIGMOD |
5.3847321e-05 |
| 1,612 |
Detecting Data Errors: Where are we and what needs to be done? |
2016 |
VLDB |
0.00011142794 |
| 11,682 |
IHCS: An Integrated Hybrid Cleaning System |
2019 |
VLDB |
4.1945683e-05 |
| 13,232 |
Data Cleaning in the Era of Data Science: Challenges and Opportunities |
2021 |
CIDR |
- |
| 2,158 |
Uni-Detect: A Unified Approach to Automated Error Detection in Tables |
2019 |
SIGMOD |
9.4141354e-05 |
| 1,627 |
Data Cleaning: Overview and Emerging Challenges |
2016 |
SIGMOD |
0.00011086905 |
| 7,237 |
CleanM: An Optimizable Query Language for Unified Scale-Out Data Cleaning |
2017 |
VLDB |
4.7928651e-05 |