Statistical Distortion: Consequences of Data Cleaning
Summary: Introduces statistical distortion as a metric for data cleaning impact. A scalable experimental framework evaluates glitch improvement, statistical distortion, and cost, addressing gaps in existing metrics; demonstrated on real-world data. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Tamraparni Dasu
- 2. Ji Meng Loh
Incoming Citations (Sorted by Pagerank)
Showing 11 of 11 citing papers.
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 3 of 3 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 656 | ERACER: A Database Approach for Statistical Inference and Data Cleaning | 2010 | SIGMOD | 0.00018588729 |
| 2,686 | Online Data Fusion | 2011 | VLDB | 8.3053595e-05 |
| 3,713 | GDR: A System for Guided Data Repair | 2010 | SIGMOD | 6.8224341e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 2,460 | Combining Quantitative and Logical Data Cleaning | 2016 | VLDB | 8.7617484e-05 |
| 11,137 | Generalizable Data Cleaning of Tabular Data in Latent Space | 2024 | VLDB | 4.1945683e-05 |
| 9,056 | A Data Quality Metric (DQM): How to Estimate the Number of Undetected Errors in Data Sets | 2017 | VLDB | 4.4039656e-05 |
| 10,026 | Minimum Change ≠ Best Cleaning: Parallel and Incremental Error Detection under Integrity Constraints | 2026 | SIGMOD | 4.1945683e-05 |
| 13,232 | Data Cleaning in the Era of Data Science: Challenges and Opportunities | 2021 | CIDR | - |
| 3,396 | Automatic Data Repair: Are We Ready to Deploy? | 2024 | VLDB | 7.1455126e-05 |
| 507 | Data Quality and Data Cleaning: An Overview | 2003 | SIGMOD | 0.00021473263 |
| 1,612 | Detecting Data Errors: Where are we and what needs to be done? | 2016 | VLDB | 0.00011142794 |
| 7,013 | Qualitative Data Cleaning | 2016 | VLDB | 4.8619024e-05 |
| 1,627 | Data Cleaning: Overview and Emerging Challenges | 2016 | SIGMOD | 0.00011086905 |