Automatic Data Repair: Are We Ready to Deploy?
Summary: Driver-information taxonomy plus empirical evaluation of 12 repair methods on 12 datasets across error rates/types and 4 downstream tasks using a new practical error-reduction metric. A unified repair-optimization boosts SOTA, shows repair consistently benefits downstream analyses, and provides deployment guidelines. (summarized by gpt-5-mini on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Wei Ni
- 2. Xiaoye Miao
- 3. Xiangyu Zhao
- 4. Yangyang Wu
- 5. Shuwei Liang
- 6. Jianwei Yin
Incoming Citations (Sorted by Pagerank)
Showing 8 of 8 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 9,558 | Clean4TSDB: A Data Cleaning Tool for Time Series Databases | 2024 | VLDB | 4.3254416e-05 |
| 9,984 | Towards Scalable Visual Data Wrangling via Direct Manipulation | 2026 | CIDR | 4.1945683e-05 |
| 10,026 | Minimum Change ≠ Best Cleaning: Parallel and Incremental Error Detection under Integrity Constraints | 2026 | SIGMOD | 4.1945683e-05 |
| 10,306 | Fault Lines: Benchmarking the Impact of Label Data Quality on ML Robustness and Fairness | 2026 | VLDB | 4.1945683e-05 |
| 10,684 | Federated Incomplete Tabular Data Prediction with Missing Complementarity | 2025 | VLDB | 4.1945683e-05 |
| 10,723 | UniClean: A Scalable Data Cleaning Solution for Mixed Errors based on Unified Cleaners and Optimized Cleaning Workflow | 2025 | VLDB | 4.1945683e-05 |
| 10,811 | DemandClean: A Multi-Objective Learning Framework for Balancing Model Tolerance to Data Authenticity and Diversity | 2025 | VLDB | 4.1945683e-05 |
| 11,137 | Generalizable Data Cleaning of Tabular Data in Latent Space | 2024 | VLDB | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 36 of 36 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 5,660 | Descriptive and Prescriptive Data Cleaning | 2014 | SIGMOD | 5.3847321e-05 |
| 2,823 | Interaction between Record Matching and Data Repairing | 2011 | SIGMOD | 8.0593894e-05 |
| 881 | Don’t be SCAREd: Use SCalable Automatic REpairing with Maximal Likelihood and Bounded Changes | 2013 | SIGMOD | 0.00015661103 |
| 9,369 | Constraint-Variance Tolerant Data Repairing | 2016 | SIGMOD | 4.3481081e-05 |
| 7,013 | Qualitative Data Cleaning | 2016 | VLDB | 4.8619024e-05 |
| 833 | Guided Data Repair | 2011 | VLDB | 0.00016138432 |
| 1,612 | Detecting Data Errors: Where are we and what needs to be done? | 2016 | VLDB | 0.00011142794 |
| 1,627 | Data Cleaning: Overview and Emerging Challenges | 2016 | SIGMOD | 0.00011086905 |
| 623 | Improving Data Quality: Consistency and Accuracy | 2007 | VLDB | 0.00018996374 |
| 3,192 | Towards Dependable Data Repairing with Fixing Rules | 2014 | SIGMOD | 7.4095761e-05 |