Database Paper Browser

Back to papers

CleanM: An Optimizable Query Language for Unified Scale-Out Data Cleaning

Summary: CleanM: unified, optimizable query language for scale-out data cleaning. Three-level translation enables cross-operator optimization; atop CleanDB, it covers more corruption types, scales better, and unifies querying with cleaning under a single interface. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
11431
Venue
VLDB
Year
2017
Pagerank
4.7928651e-05
Overall Rank
7,237 | 49.66%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 4 of 4 citing papers.

Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 22 of 22 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
66 Spark SQL: Relational Data Processing in Spark 2015 SIGMOD 0.00061639801
112 Potter's Wheel: An Interactive Data Cleaning System 2001 VLDB 0.00047045036
168 MAD Skills: New Analysis Practices for Big Data 2009 VLDB 0.00038946305
322 Record Linkage: Similarity Measures and Algorithms 2006 SIGMOD 0.00027518768
489 Data Curation at Scale: The Data Tamer System 2013 CIDR 0.00022030728
1,012 NADEEF: A Commodity Data Cleaning System 2013 SIGMOD 0.0001464733
1,074 Processing Theta-Joins using MapReduce* 2011 SIGMOD 0.00014260096
1,343 NoDB: Efficient Query Execution on Raw Data Files 2012 SIGMOD 0.00012482538
1,794 Summingbird: A Framework for Integrating Batch and Online MapReduce Computations 2014 VLDB 0.00010532024
2,018 Statistical Distortion: Consequences of Data Cleaning 2012 VLDB 9.7764643e-05
2,184 A Sample-and-Clean Framework for Fast and Accurate Query Processing on Dirty Data 2014 SIGMOD 9.3429789e-05
2,231 Dedoop: Efficient Deduplication with Hadoop 2012 VLDB 9.2304499e-05
2,740 String Similarity Joins: An Experimental Evaluation 2014 VLDB 8.1980628e-05
2,946 BigDansing: A System for Big Data Cleansing 2015 SIGMOD 7.8372441e-05
3,141 ClusterJoin: A Similarity Joins Framework using Map-Reduce 2014 VLDB 7.4829448e-05
3,548 Adaptive Query Processing on RAW Data 2014 VLDB 6.9859242e-05
4,326 Fast Queries Over Heterogeneous Data Through Engine Customization 2016 VLDB 6.288323e-05
5,382 That's All Folks! LLUNATIC Goes Open Source 2014 VLDB 5.5397633e-05
5,586 QuERy: A Framework for Integrating Entity Resolution with Query Processing 2016 VLDB 5.4219548e-05
5,729 KATARA: Reliable Data Cleaning with Knowledge Bases and Crowdsourcing 2015 VLDB 5.3506368e-05
6,407 Just-In-Time Data Virtualization: Lightweight Data Management with ViDa 2015 CIDR 5.076547e-05
8,593 Wisteria: Nurturing Scalable Data Cleaning Infrastructure 2015 VLDB 4.4891474e-05
Previous Page 1 / 1 Next

Semantically Similar Papers