Database Paper Browser

Back to papers

Data X-Ray: A Diagnostic Tool for Data Errors

Summary: Data X-Ray reframes cleaning as diagnosing errors in data generation, not purging them. A Bayesian cost model guides diagnostics; an efficient parallel algorithm scales to large datasets, delivering better diagnoses and large speedups over prior methods. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
5081
Venue
SIGMOD
Year
2015
Pagerank
7.5568954e-05
Overall Rank
3,105 | 78.41%
DOI
10.1145/2723372.2750549

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 22 of 22 citing papers.

Rank Citing Paper Year Venue Pagerank
2,126 MacroBase: Prioritizing Attention in Fast Data 2017 SIGMOD 9.4887794e-05
2,154 DIFF: A Relational Interface for Large-Scale Data Explanation 2019 VLDB 9.4208667e-05
2,460 Combining Quantitative and Logical Data Cleaning 2016 VLDB 8.7617484e-05
2,753 Complaint-driven Training Data Debugging for Query 2.0 2020 SIGMOD 8.1724339e-05
3,299 SCODED: Statistical Constraint Oriented Data Error Detection 2020 SIGMOD 7.2546659e-05
4,607 Data Integration and Machine Learning: A Natural Synergy 2018 SIGMOD 6.0538827e-05
5,445 QFix: Diagnosing Errors through Query Histories 2017 SIGMOD 5.5020909e-05
6,475 Explain3D: Explaining Disagreements in Disjoint Datasets 2019 VLDB 5.0497183e-05
6,696 Approximate Summaries for Why and Why-not Provenance 2020 VLDB 4.9581958e-05
6,779 Explaining Inference Queries with Bayesian Optimization 2021 VLDB 4.9280116e-05
6,817 Error Diagnosis and Data Profiling with Data X-Ray 2015 VLDB 4.9171711e-05
6,944 DataPrism: Exposing Disconnect between Data and Systems 2022 SIGMOD 4.8912787e-05
8,341 BugDoc: Algorithms to Debug Computational Processes 2020 SIGMOD 4.5433282e-05
8,743 CtxPipe: Context-aware Data Preparation Pipeline Construction for Machine Learning 2024 SIGMOD 4.456315e-05
8,853 Complaint-Driven Training Data Debugging at Interactive Speeds 2022 SIGMOD 4.4350727e-05
9,024 Causality-Guided Adaptive Interventional Debugging 2020 SIGMOD 4.4075011e-05
9,220 BugDoc: A System for Debugging Computational Pipelines 2020 SIGMOD 4.3702188e-05
9,533 TSExplain: Surfacing Evolving Explanations for Time Series 2021 SIGMOD 4.3269636e-05
10,269 Database Views as Explanations for Relational Deep Learning 2026 VLDB 4.1945683e-05
10,875 SDEcho: Efficient Explanation of Aggregated Sequence Difference 2025 VLDB 4.1945683e-05
11,837 QFix: Demonstrating Error Diagnosis in Query Histories 2016 SIGMOD 4.1945683e-05
11,906 Knowledge Curation and Knowledge Fusion: Challenges, Models, and Applications 2015 SIGMOD 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 29 of 29 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
31 Provenance Semirings 2007 PODS 0.0007857786
37 Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud 2012 VLDB 0.0007522744
112 Potter's Wheel: An Interactive Data Cleaning System 2001 VLDB 0.00047045036
214 Scorpion: Explaining Away Outliers in Aggregate Queries 2013 VLDB 0.0003363692
322 Record Linkage: Similarity Measures and Algorithms 2006 SIGMOD 0.00027518768
371 A Bayesian Approach to Discovering Truth from Conflicting Sources for Data Integration 2012 VLDB 0.00025389696
623 Improving Data Quality: Consistency and Accuracy 2007 VLDB 0.00018996374
691 AJAX: An Extensible Data Cleaning Tool 2000 SIGMOD 0.00018086135
833 Guided Data Repair 2011 VLDB 0.00016138432
923 Provenance and Scientific Workflows: Challenges and Opportunities 2008 SIGMOD 0.0001527609
942 A Formal Approach to Finding Explanations for Database Queries 2014 SIGMOD 0.00015155714
1,099 Interpretable and Informative Explanations of Outcomes 2015 VLDB 0.00014096312
1,119 The Complexity of Causality and Responsibility for Query Answers and non-Answers 2011 VLDB 0.0001386199
1,188 On Generating Near-Optimal Tableaux for Conditional Functional Dependencies 2008 VLDB 0.00013441729
1,534 PerfXplain: Debugging MapReduce Job Performance 2012 VLDB 0.00011468393
1,624 Sampling the Repairs of Functional Dependency Violations under Hard Constraints 2010 VLDB 0.00011099222
2,028 Putting Lipstick on Pig: Enabling Database-style Workflow Provenance 2012 VLDB 9.7433981e-05
2,379 A Revival of Integrity Constraints for Data Cleaning 2008 VLDB 8.9392633e-05
2,402 Causality and Explanations in Databases 2014 VLDB 8.8928361e-05
2,420 From Data Fusion to Knowledge Fusion 2014 VLDB 8.8530994e-05
2,452 Data Fusion – Resolving Data Conflicts for Integration 2009 VLDB 8.7839322e-05
2,602 Tracing Data Errors with View-Conditioned Causality 2011 SIGMOD 8.4667197e-05
2,852 MRI: Meaningful Interpretations of Collaborative Ratings 2011 VLDB 8.0151391e-05
3,242 Explanation-Based Auditing 2012 VLDB 7.3301779e-05
4,383 Incremental Record Linkage 2014 VLDB 6.2383094e-05
4,929 Data Auditor: Exploring Data Quality and Semantics using Pattern Tableaux 2010 VLDB 5.8217296e-05
6,606 Explainable Security for Relational Databases 2014 SIGMOD 4.996456e-05
6,744 MapRat: Meaningful Explanation, Interactive Exploration and Geo-Visualization of Collaborative Ratings 2012 VLDB 4.9419773e-05
7,280 I4E: Interactive Investigation of Iterative Information Extraction 2010 SIGMOD 4.778826e-05
Previous Page 1 / 1 Next

Semantically Similar Papers