Data Quality and Data Cleaning: An Overview
Summary: Data quality and cleaning overview; poor data distorts findings and disrupts operations, with 80–90% of a data project spent ensuring reliability. Multidisciplinary approach—management science, statistics, database research, and metadata management—proposes updated metrics, a case study, and directions for future DB research. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
Incoming Citations (Sorted by Pagerank)
Showing 4 of 4 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 322 | Record Linkage: Similarity Measures and Algorithms | 2006 | SIGMOD | 0.00027518768 |
| 1,627 | Data Cleaning: Overview and Emerging Challenges | 2016 | SIGMOD | 0.00011086905 |
| 2,066 | DBLife: A Community Information Management Platform for the Database Research Community | 2007 | CIDR | 9.6399561e-05 |
| 9,430 | Approximate Joins: Concepts and Techniques | 2005 | VLDB | 4.3441378e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 0 of 0 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 6,526 | Data Collection and Quality Challenges for Deep Learning | 2020 | VLDB | 5.0267429e-05 |
| 3,396 | Automatic Data Repair: Are We Ready to Deploy? | 2024 | VLDB | 7.1455126e-05 |
| 8,329 | The Need for Data Quality | 1993 | VLDB | 4.5435639e-05 |
| 12,624 | Systematic Development of Data Mining-Based Data Quality Tools | 2003 | VLDB | 4.1945683e-05 |
| 732 | Discovering Data Quality Rules | 2008 | VLDB | 0.00017465093 |
| 623 | Improving Data Quality: Consistency and Accuracy | 2007 | VLDB | 0.00018996374 |
| 5,660 | Descriptive and Prescriptive Data Cleaning | 2014 | SIGMOD | 5.3847321e-05 |
| 1,612 | Detecting Data Errors: Where are we and what needs to be done? | 2016 | VLDB | 0.00011142794 |
| 1,627 | Data Cleaning: Overview and Emerging Challenges | 2016 | SIGMOD | 0.00011086905 |
| 7,013 | Qualitative Data Cleaning | 2016 | VLDB | 4.8619024e-05 |