From Papers to Practice: The openclean Open-Source Data Cleaning Library
Summary: openclean is an open-source Python library that unifies data cleaning and profiling in a single extensible environment. Designed to bridge research and practice, it enables easy integration of state-of-the-art algorithms and provides extensibility for researchers to contribute while practitioners prototype data-wrangling workflows. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
No non-self incoming citations found for this paper in this database.
Authors
- 1. Heiko Müller
- 2. Sonia Castelo
- 3. Munaf Qazi
- 4. Juliana Freire
Incoming Citations (Sorted by Pagerank)
Showing 0 of 0 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 4 of 4 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 192 | HoloClean: Holistic Data Repairs with Probabilistic Inference | 2017 | VLDB | 0.00035728858 |
| 221 | Deep Entity Matching with Pre-Trained Language Models | 2021 | VLDB | 0.00033121824 |
| 676 | Archiving Scientific Data | 2002 | SIGMOD | 0.00018281665 |
| 1,625 | Data Profiling with Metanome | 2015 | VLDB | 0.00011094926 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 5,981 | DataPrep.EDA: Task-Centric Exploratory Data Analysis for Statistical Modeling in Python | 2021 | SIGMOD | 5.2448986e-05 |
| 8,593 | Wisteria: Nurturing Scalable Data Cleaning Infrastructure | 2015 | VLDB | 4.4891474e-05 |
| 1,627 | Data Cleaning: Overview and Emerging Challenges | 2016 | SIGMOD | 0.00011086905 |
| 199 | Declarative Data Cleaning: Language, Model, and Algorithms | 2001 | VLDB | 0.00035041015 |
| 7,564 | PIClean: A Probabilistic and Interactive Data Cleaning System | 2019 | SIGMOD | 4.7093702e-05 |
| 5,929 | ActiveClean: An Interactive Data Cleaning Framework For Modern Machine Learning | 2016 | SIGMOD | 5.2682177e-05 |
| 9,221 | VisClean: Interactive Cleaning for Progressive Visualization | 2020 | VLDB | 4.3699444e-05 |
| 9,577 | CoClean: Collaborative Data Cleaning | 2020 | SIGMOD | 4.3248438e-05 |
| 7,237 | CleanM: An Optimizable Query Language for Unified Scale-Out Data Cleaning | 2017 | VLDB | 4.7928651e-05 |
| 13,232 | Data Cleaning in the Era of Data Science: Challenges and Opportunities | 2021 | CIDR | - |