Database Paper Browser

Back to papers

Auto-Detect: Data-Driven Error Detection in Tables

Summary: Auto-Detect uses co-occurrence statistics from corpora to detect errors in a column, beyond regexlike rules. An ensemble of generalization languages handles diverse errors via global statistics; Wikipedia tables and Excel validate; benchmark released. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
5535
Venue
SIGMOD
Year
2018
Pagerank
8.6335464e-05
Overall Rank
2,506 | 82.57%
DOI
10.1145/3183713.3196889

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 17 of 17 citing papers.

Rank Citing Paper Year Venue Pagerank
1,337 HoloDetect: Few-Shot Learning for Error Detection 2019 SIGMOD 0.00012497164
2,158 Uni-Detect: A Unified Approach to Automated Error Detection in Tables 2019 SIGMOD 9.4141354e-05
2,587 Table-GPT: Table Fine-tuned GPT for Diverse Table Tasks 2024 SIGMOD 8.4924618e-05
3,252 Auto-Suggest: Learning-to-Recommend Data Preparation Steps Using Data Science Notebooks 2020 SIGMOD 7.3178277e-05
3,299 SCODED: Statistical Constraint Oriented Data Error Detection 2020 SIGMOD 7.2546659e-05
3,396 Automatic Data Repair: Are We Ready to Deploy? 2024 VLDB 7.1455126e-05
3,478 Transform-Data-by-Example (TDE): An Extensible Search Engine for Data Transformations 2018 VLDB 7.054159e-05
5,096 Auto-Transform: Learning-to-Transform by Patterns 2020 VLDB 5.7011825e-05
5,192 Pattern Functional Dependencies for Data Cleaning 2020 VLDB 5.6375087e-05
5,205 ANMAT: Automatic Knowledge Discovery and Error Detection through Pattern Functional Dependencies 2019 SIGMOD 5.630869e-05
7,766 ICARUS: Minimizing Human Effort in Iterative Data Completion 2018 VLDB 4.6564959e-05
7,838 Auto-Validate: Unsupervised Data Validation Using Data-Domain Patterns Inferred from Data Lakes 2021 SIGMOD 4.6377995e-05
9,348 GIDCL: A Graph-Enhanced Interpretable Data Cleaning Framework with Large Language Models 2024 SIGMOD 4.3526427e-05
9,389 DataVinci: Learning Syntactic and Semantic String Repairs 2025 SIGMOD 4.3441378e-05
10,026 Minimum Change ≠ Best Cleaning: Parallel and Incremental Error Detection under Integrity Constraints 2026 SIGMOD 4.1945683e-05
10,512 Auto-Test: Learning Semantic-Domain Constraints for Unsupervised Error Detection in Tables 2025 SIGMOD 4.1945683e-05
10,598 Auto-Prep: Holistic Prediction of Data Preparation Steps for Self-Service Business Intelligence 2025 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 14 of 14 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers