Database Paper Browser

Back to papers

Auto-Validate: Unsupervised Data Validation Using Data-Domain Patterns Inferred from Data Lakes

Summary: Unsupervised, corpus-driven validation infers data-domain patterns from data lakes to auto-validate data, reducing false positives on strings. Production datalake evaluation shows improved quality-issue detection vs prior methods; Azure Purview Auto-Tag. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
6141
Venue
SIGMOD
Year
2021
Pagerank
4.6377995e-05
Overall Rank
7,838 | 45.48%
DOI
10.1145/3448016.3457250

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 6 of 6 citing papers.

Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 21 of 21 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
22 SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets 2008 VLDB 0.0008456613
112 Potter's Wheel: An Interactive Data Cleaning System 2001 VLDB 0.00047045036
224 CORDS: Automatic Discovery of Correlations and Soft Functional Dependencies 2004 SIGMOD 0.00032746205
475 Mining Database Structure; Or, How to Build a Data Quality Browser 2002 SIGMOD 0.00022303253
555 Discovering Denial Constraints 2013 VLDB 0.00020254908
732 Discovering Data Quality Rules 2008 VLDB 0.00017465093
894 A Hybrid Approach to Functional Dependency Discovery 2016 SIGMOD 0.00015556428
1,337 HoloDetect: Few-Shot Learning for Error Detection 2019 SIGMOD 0.00012497164
1,420 Data Management Challenges in Production Machine Learning 2017 SIGMOD 0.00012057956
1,482 Automating Large-Scale Data Quality Verification 2018 VLDB 0.00011725533
2,158 Uni-Detect: A Unified Approach to Automated Error Detection in Tables 2019 SIGMOD 9.4141354e-05
2,506 Auto-Detect: Data-Driven Error Detection in Tables 2018 SIGMOD 8.6335464e-05
2,574 Discovery of Genuine Functional Dependencies from Relational Data with Missing Values 2018 VLDB 8.5173637e-05
2,888 Sato: Contextual Semantic Type Detection in Tables 2020 VLDB 7.9594996e-05
2,968 Raha: A Configuration-Free Error Detection System 2019 SIGMOD 7.7985097e-05
3,141 ClusterJoin: A Similarity Joins Framework using Map-Reduce 2014 VLDB 7.4829448e-05
3,299 SCODED: Statistical Constraint Oriented Data Error Detection 2020 SIGMOD 7.2546659e-05
4,929 Data Auditor: Exploring Data Quality and Semantics using Pattern Tableaux 2010 VLDB 5.8217296e-05
5,205 ANMAT: Automatic Knowledge Discovery and Error Detection through Pattern Functional Dependencies 2019 SIGMOD 5.630869e-05
6,416 Synthesizing Type-Detection Logic for Rich Semantic Data Types using Open-source Code 2018 SIGMOD 5.072267e-05
6,993 Unit Testing Data with Deequ 2019 SIGMOD 4.8693227e-05
Previous Page 1 / 1 Next

Semantically Similar Papers