Database Paper Browser

Back to papers

GIDCL: A Graph-Enhanced Interpretable Data Cleaning Framework with Large Language Models

Summary: GIDCL applies Graph Neural Networks on graphified tables to exploit structural correlations for data cleaning. A creator-critic workflow with LLMs yields interpretable cleaning rules and minimal labeled-data features, achieving ~10% F1 with 20 tuples. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
6984
Venue
SIGMOD
Year
2024
Pagerank
4.3526427e-05
Overall Rank
9,348 | 34.97%
DOI
10.1145/3698811

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 2 of 2 citing papers.

Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 43 of 43 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
49 Consistent Query Answers in Inconsistent Databases 1999 PODS 0.00067660624
192 HoloClean: Holistic Data Repairs with Probabilistic Inference 2017 VLDB 0.00035728858
513 TURL: Table Understanding through Representation Learning 2021 VLDB 0.00021288342
517 Can Foundation Models Wrangle Your Data? 2023 VLDB 0.00021169035
555 Discovering Denial Constraints 2013 VLDB 0.00020254908
623 Improving Data Quality: Consistency and Accuracy 2007 VLDB 0.00018996374
791 ActiveClean: Interactive Data Cleaning For Statistical Modeling 2016 VLDB 0.00016629664
833 Guided Data Repair 2011 VLDB 0.00016138432
881 Don’t be SCAREd: Use SCalable Automatic REpairing with Maximal Likelihood and Bounded Changes 2013 SIGMOD 0.00015661103
1,159 Towards Certain Fixes with Editing Rules and Master Data 2010 VLDB 0.00013592813
1,197 The LLUNATIC Data-Cleaning Framework 2013 VLDB 0.00013390321
1,211 Truth Finding on the Deep Web: Is the Problem Solved? 2013 VLDB 0.00013257101
1,337 HoloDetect: Few-Shot Learning for Error Detection 2019 SIGMOD 0.00012497164
1,546 KATARA: A Data Cleaning System Powered by Knowledge Bases and Crowdsourcing 2015 SIGMOD 0.00011446851
1,612 Detecting Data Errors: Where are we and what needs to be done? 2016 VLDB 0.00011142794
1,894 Baran: Effective Error Correction via a Unified Context Representation and Transfer Learning 2020 VLDB 0.0001018378
2,158 Uni-Detect: A Unified Approach to Automated Error Detection in Tables 2019 SIGMOD 9.4141354e-05
2,184 A Sample-and-Clean Framework for Fast and Accurate Query Processing on Dirty Data 2014 SIGMOD 9.3429789e-05
2,253 Efficient Denial Constraint Discovery with Hydra 2018 VLDB 9.1937209e-05
2,302 Nearest Neighbor Classifiers over Incomplete Information: From Certain Answers to Certain Predictions 2021 VLDB 9.0668832e-05
2,349 RPT: Relational Pre-trained Transformer Is Almost All You Need towards Democratizing Data Preparation 2021 VLDB 8.9876423e-05
2,483 Discovery of Approximate (and Exact) Denial Constraints 2020 VLDB 8.6864916e-05
2,506 Auto-Detect: Data-Driven Error Detection in Tables 2018 SIGMOD 8.6335464e-05
2,638 Messing Up with BART: Error Generation for Evaluating Data-Cleaning Algorithms 2016 VLDB 8.399764e-05
2,968 Raha: A Configuration-Free Error Detection System 2019 SIGMOD 7.7985097e-05
3,440 Approximate Denial Constraints 2020 VLDB 7.0918817e-05
4,273 Cleaning Denial Constraint Violations through Relaxation 2020 SIGMOD 6.3003864e-05
4,904 Temporal Rules Discovery for Web Data Cleaning 2016 VLDB 5.8399195e-05
5,153 Horizon: Scalable Dependency-driven Data Cleaning 2021 VLDB 5.6607963e-05
5,192 Pattern Functional Dependencies for Data Cleaning 2020 VLDB 5.6375087e-05
5,618 Explaining Repaired Data with CFDs 2018 VLDB 5.4079415e-05
5,803 Semandaq: A Data Quality System Based on Conditional Functional Dependencies 2008 VLDB 5.3205861e-05
5,978 Rotom: A Meta-Learned Data Augmentation Framework for Entity Matching, Data Cleaning, Text Classification, and Beyond 2021 SIGMOD 5.2453012e-05
6,187 Semi-Supervised Data Cleaning with Raha and Baran 2021 CIDR 5.1656857e-05
6,280 Self-supervised and Interpretable Data Cleaning with Sequence Generative Adversarial Networks 2023 VLDB 5.1290457e-05
6,350 NADEEF: A Generalized Data Cleaning System 2013 VLDB 5.101815e-05
6,690 Parallel Discrepancy Detection and Incremental Detection 2021 VLDB 4.9621556e-05
7,066 On Multiple Semantics for Declarative Database Repairs 2020 SIGMOD 4.8445108e-05
7,867 Learning Over Dirty Data Without Cleaning 2020 SIGMOD 4.6320452e-05
8,422 Deducing Certain Fixes to Graphs 2019 VLDB 4.5167705e-05
8,875 CerFix: A System for Cleaning Data with Certain Fixes 2011 VLDB 4.430475e-05
9,369 Constraint-Variance Tolerant Data Repairing 2016 SIGMOD 4.3481081e-05
9,963 Parallel Rule Discovery from Large Datasets by Sampling 2022 SIGMOD 4.2294678e-05
Previous Page 1 / 1 Next

Semantically Similar Papers