Error Diagnosis and Data Profiling with Data X-Ray
Summary: DATA XRAY: a large-scale diagnostic framework for error diagnosis and data profiling, targeting root causes rather than symptoms. A Bayesian cost model with a parallel, top-down diagnostic algorithm uncovers hidden connections and common properties among erroneous elements, with an interactive interface to tune parameters and compare to alternatives. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Xiaolan Wang
- 2. Mary Feng
- 3. Yue Wang
- 4. Xin Luna Dong
- 5. Alexandra Meliou
Incoming Citations (Sorted by Pagerank)
Showing 4 of 4 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 939 | Data Lake Management: Challenges and Opportunities | 2019 | VLDB | 0.00015187344 |
| 5,222 | Enabling SQL-based Training Data Debugging for Federated Learning | 2022 | VLDB | 5.6210545e-05 |
| 6,475 | Explain3D: Explaining Disagreements in Disjoint Datasets | 2019 | VLDB | 5.0497183e-05 |
| 7,022 | A Unified Approach for Resilience and Causal Responsibility with Integer Linear Programming (ILP) and LP Relaxations | 2023 | SIGMOD | 4.8576599e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 5 of 5 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 112 | Potter's Wheel: An Interactive Data Cleaning System | 2001 | VLDB | 0.00047045036 |
| 1,188 | On Generating Near-Optimal Tableaux for Conditional Functional Dependencies | 2008 | VLDB | 0.00013441729 |
| 2,379 | A Revival of Integrity Constraints for Data Cleaning | 2008 | VLDB | 8.9392633e-05 |
| 3,105 | Data X-Ray: A Diagnostic Tool for Data Errors | 2015 | SIGMOD | 7.5568954e-05 |
| 4,929 | Data Auditor: Exploring Data Quality and Semantics using Pattern Tableaux | 2010 | VLDB | 5.8217296e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 11,841 | BART in Action: Error Generation and Empirical Evaluations of Data-Cleaning Systems | 2016 | SIGMOD | 4.1945683e-05 |
| 1,627 | Data Cleaning: Overview and Emerging Challenges | 2016 | SIGMOD | 0.00011086905 |
| 7,013 | Qualitative Data Cleaning | 2016 | VLDB | 4.8619024e-05 |
| 3,396 | Automatic Data Repair: Are We Ready to Deploy? | 2024 | VLDB | 7.1455126e-05 |
| 6,384 | A Demonstration of DBWipes: Clean as You Query | 2012 | VLDB | 5.0880333e-05 |
| 10,026 | Minimum Change ≠ Best Cleaning: Parallel and Incremental Error Detection under Integrity Constraints | 2026 | SIGMOD | 4.1945683e-05 |
| 1,612 | Detecting Data Errors: Where are we and what needs to be done? | 2016 | VLDB | 0.00011142794 |
| 5,445 | QFix: Diagnosing Errors through Query Histories | 2017 | SIGMOD | 5.5020909e-05 |
| 11,837 | QFix: Demonstrating Error Diagnosis in Query Histories | 2016 | SIGMOD | 4.1945683e-05 |
| 3,105 | Data X-Ray: A Diagnostic Tool for Data Errors | 2015 | SIGMOD | 7.5568954e-05 |