Back to papers
SCODED: Statistical Constraint Oriented Data Error Detection
Summary: SCODED uses Statistical Constraints (SCs) for data cleaning, aligning with integrity constraints for insight and downstream use. Two parts SC Violation Detection and Error Drill Down (topk); experiments on synthetic data show SCs beat latest methods.
(summarized by gpt-5-nano on Feb 09 2026)
- Paper ID
- 5781
- Venue
- SIGMOD
- Year
- 2020
- Pagerank
- 7.2546659e-05
- Overall Rank
- 3,299 | 77.06%
- DOI
-
10.1145/3318464.3380568
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 16 of 16 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
| 3,252 |
Auto-Suggest: Learning-to-Recommend Data Preparation Steps Using Data Science Notebooks |
2020 |
SIGMOD |
7.3178277e-05 |
| 5,222 |
Enabling SQL-based Training Data Debugging for Federated Learning |
2022 |
VLDB |
5.6210545e-05 |
| 6,690 |
Parallel Discrepancy Detection and Incremental Detection |
2021 |
VLDB |
4.9621556e-05 |
| 6,944 |
DataPrism: Exposing Disconnect between Data and Systems |
2022 |
SIGMOD |
4.8912787e-05 |
| 7,202 |
Conformance Constraint Discovery: Measuring Trust in Data-Driven Systems |
2021 |
SIGMOD |
4.8023314e-05 |
| 7,449 |
OTClean: Data Cleaning for Conditional Independence Violations using Optimal Transport |
2024 |
SIGMOD |
4.7269357e-05 |
| 7,667 |
Fast Detection of Denial Constraint Violations |
2022 |
VLDB |
4.683767e-05 |
| 7,838 |
Auto-Validate: Unsupervised Data Validation Using Data-Domain Patterns Inferred from Data Lakes |
2021 |
SIGMOD |
4.6377995e-05 |
| 7,926 |
CoCo: Interactive Exploration of Conformance Constraints for Data Understanding and Data Cleaning |
2021 |
SIGMOD |
4.6144554e-05 |
| 9,410 |
Leveraging Application Data Constraints to Optimize Database-Backed Web Applications |
2023 |
VLDB |
4.3441378e-05 |
| 9,434 |
Rock: Cleaning Data by Embedding ML in Logic Rules |
2024 |
SIGMOD |
4.3430376e-05 |
| 9,560 |
MTSClean: Efficient Constraint-based Cleaning for Multi-Dimensional Time Series Data |
2024 |
VLDB |
4.3254416e-05 |
| 10,019 |
Guardrail: Automated Integrity Constraint Synthesis From Noisy Data |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,026 |
Minimum Change ≠ Best Cleaning: Parallel and Incremental Error Detection under Integrity Constraints |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,213 |
Stress-Testing Causal Claims via Cardinality Repairs |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,598 |
Auto-Prep: Holistic Prediction of Data Preparation Steps for Self-Service Business Intelligence |
2025 |
VLDB |
4.1945683e-05 |
Outgoing Citations (Sorted by Pagerank)
Showing 22 of 22 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 192 |
HoloClean: Holistic Data Repairs with Probabilistic Inference |
2017 |
VLDB |
0.00035728858 |
| 214 |
Scorpion: Explaining Away Outliers in Aggregate Queries |
2013 |
VLDB |
0.0003363692 |
| 224 |
CORDS: Automatic Discovery of Correlations and Soft Functional Dependencies |
2004 |
SIGMOD |
0.00032746205 |
| 265 |
A Cost-Based Model and Effective Heuristic for Repairing Constraints by Value Modification |
2005 |
SIGMOD |
0.00029763412 |
| 555 |
Discovering Denial Constraints |
2013 |
VLDB |
0.00020254908 |
| 656 |
ERACER: A Database Approach for Statistical Inference and Data Cleaning |
2010 |
SIGMOD |
0.00018588729 |
| 881 |
Don’t be SCAREd: Use SCalable Automatic REpairing with Maximal Likelihood and Bounded Changes |
2013 |
SIGMOD |
0.00015661103 |
| 942 |
A Formal Approach to Finding Explanations for Database Queries |
2014 |
SIGMOD |
0.00015155714 |
| 1,041 |
Interventional Fairness : Causal Database Repair for Algorithmic Fairness |
2019 |
SIGMOD |
0.00014482047 |
| 1,337 |
HoloDetect: Few-Shot Learning for Error Detection |
2019 |
SIGMOD |
0.00012497164 |
| 1,546 |
KATARA: A Data Cleaning System Powered by Knowledge Bases and Crowdsourcing |
2015 |
SIGMOD |
0.00011446851 |
| 1,612 |
Detecting Data Errors: Where are we and what needs to be done? |
2016 |
VLDB |
0.00011142794 |
| 1,627 |
Data Cleaning: Overview and Emerging Challenges |
2016 |
SIGMOD |
0.00011086905 |
| 2,158 |
Uni-Detect: A Unified Approach to Automated Error Detection in Tables |
2019 |
SIGMOD |
9.4141354e-05 |
| 2,506 |
Auto-Detect: Data-Driven Error Detection in Tables |
2018 |
SIGMOD |
8.6335464e-05 |
| 2,797 |
Query-Oriented Data Cleaning with Oracles |
2015 |
SIGMOD |
8.1108589e-05 |
| 2,810 |
Bias in OLAP Queries: Detection, Explanation, and Removal (Or Think Twice About Your AVG-Query) |
2018 |
SIGMOD |
8.0810163e-05 |
| 2,968 |
Raha: A Configuration-Free Error Detection System |
2019 |
SIGMOD |
7.7985097e-05 |
| 3,105 |
Data X-Ray: A Diagnostic Tool for Data Errors |
2015 |
SIGMOD |
7.5568954e-05 |
| 5,445 |
QFix: Diagnosing Errors through Query Histories |
2017 |
SIGMOD |
5.5020909e-05 |
| 5,929 |
ActiveClean: An Interactive Data Cleaning Framework For Modern Machine Learning |
2016 |
SIGMOD |
5.2682177e-05 |
| 7,262 |
HypDB: A Demonstration of Detecting, Explaining and Resolving Bias in OLAP queries |
2018 |
VLDB |
4.78584e-05 |
Semantically Similar Papers