Database Paper Browser

Back to papers

Guided Data Repair

Summary: GDR blends user feedback with automated repairs to speed data cleaning and reduce manual effort. VOI-based grouping and active learning rank repairs; ML applies updates automatically, with real-data evaluation of user effort versus quality. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
10257
Venue
VLDB
Year
2011
Pagerank
0.00016138432
Overall Rank
833 | 94.21%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 48 of 48 citing papers.

Rank Citing Paper Year Venue Pagerank
791 ActiveClean: Interactive Data Cleaning For Statistical Modeling 2016 VLDB 0.00016629664
881 Don’t be SCAREd: Use SCalable Automatic REpairing with Maximal Likelihood and Bounded Changes 2013 SIGMOD 0.00015661103
939 Data Lake Management: Challenges and Opportunities 2019 VLDB 0.00015187344
1,012 NADEEF: A Commodity Data Cleaning System 2013 SIGMOD 0.0001464733
1,197 The LLUNATIC Data-Cleaning Framework 2013 VLDB 0.00013390321
1,350 Northstar: An Interactive Data Science System 2018 VLDB 0.00012431059
1,482 Automating Large-Scale Data Quality Verification 2018 VLDB 0.00011725533
1,546 KATARA: A Data Cleaning System Powered by Knowledge Bases and Crowdsourcing 2015 SIGMOD 0.00011446851
1,627 Data Cleaning: Overview and Emerging Challenges 2016 SIGMOD 0.00011086905
1,894 Baran: Effective Error Correction via a Unified Context Representation and Transfer Learning 2020 VLDB 0.0001018378
2,184 A Sample-and-Clean Framework for Fast and Accurate Query Processing on Dirty Data 2014 SIGMOD 9.3429789e-05
2,349 RPT: Relational Pre-trained Transformer Is Almost All You Need towards Democratizing Data Preparation 2021 VLDB 8.9876423e-05
2,566 Database Repairs and Consistent Query Answering: Origins and Further Developments 2019 PODS 8.5243847e-05
2,946 BigDansing: A System for Big Data Cleansing 2015 SIGMOD 7.8372441e-05
2,968 Raha: A Configuration-Free Error Detection System 2019 SIGMOD 7.7985097e-05
3,105 Data X-Ray: A Diagnostic Tool for Data Errors 2015 SIGMOD 7.5568954e-05
3,192 Towards Dependable Data Repairing with Fixing Rules 2014 SIGMOD 7.4095761e-05
3,773 Cleaning Crowdsourced Labels Using Oracles for Statistical Classification 2019 VLDB 6.7758649e-05
3,976 UGuide – User-Guided Discovery of FD-Detectable Errors 2017 SIGMOD 6.5736462e-05
5,028 Adaptive Data Augmentation for Supervised Learning over Missing Data 2021 VLDB 5.7506746e-05
5,032 Actively Soliciting Feedback for Query Answers in Keyword Search-Based Data Integration 2013 VLDB 5.748807e-05
5,445 QFix: Diagnosing Errors through Query Histories 2017 SIGMOD 5.5020909e-05
5,618 Explaining Repaired Data with CFDs 2018 VLDB 5.4079415e-05
5,729 KATARA: Reliable Data Cleaning with Knowledge Bases and Crowdsourcing 2015 VLDB 5.3506368e-05
5,929 ActiveClean: An Interactive Data Cleaning Framework For Modern Machine Learning 2016 SIGMOD 5.2682177e-05
6,280 Self-supervised and Interpretable Data Cleaning with Sequence Generative Adversarial Networks 2023 VLDB 5.1290457e-05
6,350 NADEEF: A Generalized Data Cleaning System 2013 VLDB 5.101815e-05
7,013 Qualitative Data Cleaning 2016 VLDB 4.8619024e-05
7,648 User Guidance for Efficient Fact Checking 2019 VLDB 4.6889787e-05
7,766 ICARUS: Minimizing Human Effort in Iterative Data Completion 2018 VLDB 4.6564959e-05
7,867 Learning Over Dirty Data Without Cleaning 2020 SIGMOD 4.6320452e-05
8,092 Saga: A Scalable Framework for Optimizing Data Cleaning Pipelines for Machine Learning Applications 2023 SIGMOD 4.587921e-05
8,362 Minimizing Efforts in Validating Crowd Answers 2015 SIGMOD 4.5366717e-05
8,422 Deducing Certain Fixes to Graphs 2019 VLDB 4.5167705e-05
8,590 Exploratory Training: When Annotators Learn About Data 2023 SIGMOD 4.4896282e-05
8,729 OneProvenance: Efficient Extraction of Dynamic Coarse-Grained Provenance From Database Query Event Logs 2023 VLDB 4.4582221e-05
8,875 CerFix: A System for Cleaning Data with Certain Fixes 2011 VLDB 4.430475e-05
9,056 A Data Quality Metric (DQM): How to Estimate the Number of Undetected Errors in Data Sets 2017 VLDB 4.4039656e-05
9,278 Interactive and Deterministic Data Cleaning: A Tossed Stone Raises a Thousand Ripples 2016 SIGMOD 4.3639892e-05
9,348 GIDCL: A Graph-Enhanced Interpretable Data Cleaning Framework with Large Language Models 2024 SIGMOD 4.3526427e-05
9,560 MTSClean: Efficient Constraint-based Cleaning for Multi-Dimensional Time Series Data 2024 VLDB 4.3254416e-05
9,849 Reptile: Aggregation-level Explanations for Hierarchical Data 2022 SIGMOD 4.2721228e-05
10,395 User-Centric Property Graph Repairs 2025 SIGMOD 4.1945683e-05
10,845 Versatile Property Graph Transformations 2025 VLDB 4.1945683e-05
10,855 bNDCRepair: Cleaning both Data Errors and Inaccurate Constraints on Numerical Sequential Data 2025 VLDB 4.1945683e-05
11,454 Contextual Data Cleaning with Ontology FDs 2021 SIGMOD 4.1945683e-05
11,536 LOCATER: Cleaning WiFi Connectivity Datasets for Semantic Localization 2021 VLDB 4.1945683e-05
11,770 Staging User Feedback toward Rapid Conflict Resolution in Data Fusion 2017 SIGMOD 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 11 of 11 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers