BigDansing: A System for Big Data Cleansing
Summary: BigDansing: scalable big-data cleansing. It lets users express rules declaratively or procedurally and compiles them into distributed transforms with shared scans and specialized joins atop DBMS/MapReduce, delivering up to 100x speedups while preserving repair quality. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Zuhair Khayyat
- 2. Ihab F. Ilyas
- 3. Alekh Jindal
- 4. Samuel Madden
- 5. Mourad Ouzzani
- 6. Paolo Papotti
- 7. Jorge-Arnulfo Quiané-Ruiz
- 8. Nan Tang
- 9. Si Yin
Incoming Citations (Sorted by Pagerank)
Showing 34 of 34 citing papers.
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 19 of 19 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 1,012 | NADEEF: A Commodity Data Cleaning System | 2013 | SIGMOD | 0.0001464733 |
| 13,232 | Data Cleaning in the Era of Data Science: Challenges and Opportunities | 2021 | CIDR | - |
| 7,013 | Qualitative Data Cleaning | 2016 | VLDB | 4.8619024e-05 |
| 199 | Declarative Data Cleaning: Language, Model, and Algorithms | 2001 | VLDB | 0.00035041015 |
| 10,723 | UniClean: A Scalable Data Cleaning Solution for Mixed Errors based on Unified Cleaners and Optimized Cleaning Workflow | 2025 | VLDB | 4.1945683e-05 |
| 1,627 | Data Cleaning: Overview and Emerging Challenges | 2016 | SIGMOD | 0.00011086905 |
| 1,277 | The Data Civilizer System | 2017 | CIDR | 0.00012879695 |
| 4,273 | Cleaning Denial Constraint Violations through Relaxation | 2020 | SIGMOD | 6.3003864e-05 |
| 7,237 | CleanM: An Optimizable Query Language for Unified Scale-Out Data Cleaning | 2017 | VLDB | 4.7928651e-05 |
| 5,660 | Descriptive and Prescriptive Data Cleaning | 2014 | SIGMOD | 5.3847321e-05 |