Database Paper Browser

Back to papers

Auto-Test: Learning Semantic-Domain Constraints for Unsupervised Error Detection in Tables

Summary: Auto-Test learns Semantic-Domain Constraints from corpora for unsupervised error detection, removing per-table expert specification. Optimization-based distillation yields a provably reliable core that detects errors and augments cleaning; 2400-column benchmark and code released. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
7284
Venue
SIGMOD
Year
2025
Pagerank
4.1945683e-05
Overall Rank
10,512 | 26.87%
DOI
10.1145/3725396

Incoming Non-self Citations Over Time

No non-self incoming citations found for this paper in this database.

Authors

Incoming Citations (Sorted by Pagerank)

Showing 0 of 0 citing papers.

Rank Citing Paper Year Venue Pagerank
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 21 of 21 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
161 LOF: Identifying Density-Based Local Outliers 2000 SIGMOD 0.00039846974
192 HoloClean: Holistic Data Repairs with Probabilistic Inference 2017 VLDB 0.00035728858
364 Annotating and Searching Web Tables Using Entities, Types and Relationships 2010 VLDB 0.00025637562
517 Can Foundation Models Wrangle Your Data? 2023 VLDB 0.00021169035
774 Algorithms for Mining Distance-Based Outliers in Large Datasets 1998 VLDB 0.00016865771
881 Don’t be SCAREd: Use SCalable Automatic REpairing with Maximal Likelihood and Bounded Changes 2013 SIGMOD 0.00015661103
1,894 Baran: Effective Error Correction via a Unified Context Representation and Transfer Learning 2020 VLDB 0.0001018378
2,506 Auto-Detect: Data-Driven Error Detection in Tables 2018 SIGMOD 8.6335464e-05
2,517 Annotating Columns with Pre-trained Language Models 2022 SIGMOD 8.6092139e-05
2,574 Discovery of Genuine Functional Dependencies from Relational Data with Missing Values 2018 VLDB 8.5173637e-05
2,587 Table-GPT: Table Fine-tuned GPT for Diverse Table Tasks 2024 SIGMOD 8.4924618e-05
2,946 BigDansing: A System for Big Data Cleansing 2015 SIGMOD 7.8372441e-05
2,968 Raha: A Configuration-Free Error Detection System 2019 SIGMOD 7.7985097e-05
5,099 ArcheType: A Novel Framework for Open-Source Column Type Annotation using Large Language Models 2024 VLDB 5.6997784e-05
5,981 DataPrep.EDA: Task-Centric Exploratory Data Analysis for Statistical Modeling in Python 2021 SIGMOD 5.2448986e-05
6,416 Synthesizing Type-Detection Logic for Rich Semantic Data Types using Open-source Code 2018 SIGMOD 5.072267e-05
6,993 Unit Testing Data with Deequ 2019 SIGMOD 4.8693227e-05
7,838 Auto-Validate: Unsupervised Data Validation Using Data-Domain Patterns Inferred from Data Lakes 2021 SIGMOD 4.6377995e-05
8,579 RECA: Related Tables Enhanced Column Semantic Type Annotation Framework 2023 VLDB 4.4922446e-05
8,852 Watchog: A Light-weight Contrastive Learning based Framework for Column Annotation 2023 SIGMOD 4.4356508e-05
9,490 Auto-BI: Automatically Build BI-Models Leveraging Local Join Prediction and Global Schema Graph 2023 VLDB 4.3341665e-05
Previous Page 1 / 1 Next

Semantically Similar Papers

Overall Rank Paper Year Venue Pagerank
1,627 Data Cleaning: Overview and Emerging Challenges 2016 SIGMOD 0.00011086905
7,867 Learning Over Dirty Data Without Cleaning 2020 SIGMOD 4.6320452e-05
1,612 Detecting Data Errors: Where are we and what needs to be done? 2016 VLDB 0.00011142794
3,396 Automatic Data Repair: Are We Ready to Deploy? 2024 VLDB 7.1455126e-05
1,482 Automating Large-Scale Data Quality Verification 2018 VLDB 0.00011725533
3,192 Towards Dependable Data Repairing with Fixing Rules 2014 SIGMOD 7.4095761e-05
623 Improving Data Quality: Consistency and Accuracy 2007 VLDB 0.00018996374
6,187 Semi-Supervised Data Cleaning with Raha and Baran 2021 CIDR 5.1656857e-05
2,158 Uni-Detect: A Unified Approach to Automated Error Detection in Tables 2019 SIGMOD 9.4141354e-05
2,506 Auto-Detect: Data-Driven Error Detection in Tables 2018 SIGMOD 8.6335464e-05