PIClean: A Probabilistic and Interactive Data Cleaning System
Summary: PIClean is a probabilistic, interactive data cleaning system that uses low-rank approximation to uncover cross-column relationships for joint error detection and repair. User feedback confirms or rejects probabilistic fixes, continually updating models to improve accuracy and coverage. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Zhuoran Yu
- 2. Xu Chu
Incoming Citations (Sorted by Pagerank)
Showing 3 of 3 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 6,187 | Semi-Supervised Data Cleaning with Raha and Baran | 2021 | CIDR | 5.1656857e-05 |
| 7,223 | Akane: Perplexity-Guided Time Series Data Cleaning | 2024 | SIGMOD | 4.7965857e-05 |
| 10,211 | SHoTClean: Bridging Soft and Hard Constraints for Multivariate Time Series Cleaning | 2026 | SIGMOD | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 6 of 6 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 112 | Potter's Wheel: An Interactive Data Cleaning System | 2001 | VLDB | 0.00047045036 |
| 192 | HoloClean: Holistic Data Repairs with Probabilistic Inference | 2017 | VLDB | 0.00035728858 |
| 555 | Discovering Denial Constraints | 2013 | VLDB | 0.00020254908 |
| 732 | Discovering Data Quality Rules | 2008 | VLDB | 0.00017465093 |
| 1,612 | Detecting Data Errors: Where are we and what needs to be done? | 2016 | VLDB | 0.00011142794 |
| 1,627 | Data Cleaning: Overview and Emerging Challenges | 2016 | SIGMOD | 0.00011086905 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 5,660 | Descriptive and Prescriptive Data Cleaning | 2014 | SIGMOD | 5.3847321e-05 |
| 7,013 | Qualitative Data Cleaning | 2016 | VLDB | 4.8619024e-05 |
| 11,515 | From Papers to Practice: The openclean Open-Source Data Cleaning Library | 2021 | VLDB | 4.1945683e-05 |
| 791 | ActiveClean: Interactive Data Cleaning For Statistical Modeling | 2016 | VLDB | 0.00016629664 |
| 4,668 | PrivateClean: Data Cleaning and Differential Privacy | 2016 | SIGMOD | 6.0115918e-05 |
| 1,627 | Data Cleaning: Overview and Emerging Challenges | 2016 | SIGMOD | 0.00011086905 |
| 192 | HoloClean: Holistic Data Repairs with Probabilistic Inference | 2017 | VLDB | 0.00035728858 |
| 11,682 | IHCS: An Integrated Hybrid Cleaning System | 2019 | VLDB | 4.1945683e-05 |
| 5,929 | ActiveClean: An Interactive Data Cleaning Framework For Modern Machine Learning | 2016 | SIGMOD | 5.2682177e-05 |
| 9,221 | VisClean: Interactive Cleaning for Progressive Visualization | 2020 | VLDB | 4.3699444e-05 |