CleanM: An Optimizable Query Language for Unified Scale-Out Data Cleaning
Summary: CleanM: unified, optimizable query language for scale-out data cleaning. Three-level translation enables cross-operator optimization; atop CleanDB, it covers more corruption types, scales better, and unifies querying with cleaning under a single interface. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
Incoming Citations (Sorted by Pagerank)
Showing 4 of 4 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 4,273 | Cleaning Denial Constraint Violations through Relaxation | 2020 | SIGMOD | 6.3003864e-05 |
| 6,658 | Scalable Querying of Nested Data | 2021 | VLDB | 4.9711629e-05 |
| 8,118 | Maximus: A Modular Accelerated Query Engine for Data Analytics on Heterogeneous Systems | 2025 | SIGMOD | 4.5814829e-05 |
| 9,771 | EasyDR: A Human-in-the-loop Error Detection and Repair Platform for Holistic Table Cleaning | 2022 | VLDB | 4.2856106e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 22 of 22 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 6,729 | Keyword Query Cleaning | 2008 | VLDB | 4.9483065e-05 |
| 8,007 | A Grammar-based Entity Representation Framework for Data Cleaning | 2009 | SIGMOD | 4.6068018e-05 |
| 7,867 | Learning Over Dirty Data Without Cleaning | 2020 | SIGMOD | 4.6320452e-05 |
| 1,627 | Data Cleaning: Overview and Emerging Challenges | 2016 | SIGMOD | 0.00011086905 |
| 5,929 | ActiveClean: An Interactive Data Cleaning Framework For Modern Machine Learning | 2016 | SIGMOD | 5.2682177e-05 |
| 2,946 | BigDansing: A System for Big Data Cleansing | 2015 | SIGMOD | 7.8372441e-05 |
| 11,515 | From Papers to Practice: The openclean Open-Source Data Cleaning Library | 2021 | VLDB | 4.1945683e-05 |
| 13,232 | Data Cleaning in the Era of Data Science: Challenges and Opportunities | 2021 | CIDR | - |
| 199 | Declarative Data Cleaning: Language, Model, and Algorithms | 2001 | VLDB | 0.00035041015 |
| 10,723 | UniClean: A Scalable Data Cleaning Solution for Mixed Errors based on Unified Cleaners and Optimized Cleaning Workflow | 2025 | VLDB | 4.1945683e-05 |