Database Paper Browser

Back to papers

Baran: Effective Error Correction via a Unified Context Representation and Transfer Learning

Summary: Baran introduces a unified context representation and transfer learning to fuse multiple error-corrector models for data repair. Modeling full context—value, tuple co-occurrences, and attribute type—yields richer candidates and higher precision, with Wikipedia pretraining boosting recall and needing ~20 labeled tuples. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
12091
Venue
VLDB
Year
2020
Pagerank
0.0001018378
Overall Rank
1,894 | 86.83%
DOI
10.14778/3407790.3407801

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 32 of 32 citing papers.

Rank Citing Paper Year Venue Pagerank
2,349 RPT: Relational Pre-trained Transformer Is Almost All You Need towards Democratizing Data Preparation 2021 VLDB 8.9876423e-05
2,587 Table-GPT: Table Fine-tuned GPT for Diverse Table Tasks 2024 SIGMOD 8.4924618e-05
3,311 Efficient and Effective Data Imputation with Influence Functions 2022 VLDB 7.2406486e-05
3,396 Automatic Data Repair: Are We Ready to Deploy? 2024 VLDB 7.1455126e-05
5,028 Adaptive Data Augmentation for Supervised Learning over Missing Data 2021 VLDB 5.7506746e-05
5,153 Horizon: Scalable Dependency-driven Data Cleaning 2021 VLDB 5.6607963e-05
5,978 Rotom: A Meta-Learned Data Augmentation Framework for Entity Matching, Data Cleaning, Text Classification, and Beyond 2021 SIGMOD 5.2453012e-05
6,187 Semi-Supervised Data Cleaning with Raha and Baran 2021 CIDR 5.1656857e-05
6,280 Self-supervised and Interpretable Data Cleaning with Sequence Generative Adversarial Networks 2023 VLDB 5.1290457e-05
7,449 OTClean: Data Cleaning for Conditional Independence Violations using Optimal Transport 2024 SIGMOD 4.7269357e-05
7,667 Fast Detection of Denial Constraint Violations 2022 VLDB 4.683767e-05
8,092 Saga: A Scalable Framework for Optimizing Data Cleaning Pipelines for Machine Learning Applications 2023 SIGMOD 4.587921e-05
8,745 Sparcle: Boosting the Accuracy of Data Cleaning Systems through Spatial Awareness 2024 VLDB 4.456315e-05
9,348 GIDCL: A Graph-Enhanced Interpretable Data Cleaning Framework with Large Language Models 2024 SIGMOD 4.3526427e-05
9,389 DataVinci: Learning Syntactic and Semantic String Repairs 2025 SIGMOD 4.3441378e-05
9,434 Rock: Cleaning Data by Embedding ML in Logic Rules 2024 SIGMOD 4.3430376e-05
9,479 Data Imputation with Limited Data Redundancy Using Data Lakes 2025 VLDB 4.3341665e-05
9,646 Discovering Functional Dependencies through Hitting Set Enumeration 2024 SIGMOD 4.3109001e-05
9,777 Data Augmentation for ML-driven Data Preparation and Integration 2021 VLDB 4.2856106e-05
9,847 Discovering Top-k Relevant and Diversified Rules 2024 SIGMOD 4.2721228e-05
9,849 Reptile: Aggregation-level Explanations for Hierarchical Data 2022 SIGMOD 4.2721228e-05
9,856 In-Database Data Imputation 2024 SIGMOD 4.269353e-05
9,984 Towards Scalable Visual Data Wrangling via Direct Manipulation 2026 CIDR 4.1945683e-05
10,478 Data Enhancement for Binary Classification of Relational Data 2025 SIGMOD 4.1945683e-05
10,512 Auto-Test: Learning Semantic-Domain Constraints for Unsupervised Error Detection in Tables 2025 SIGMOD 4.1945683e-05
10,598 Auto-Prep: Holistic Prediction of Data Preparation Steps for Self-Service Business Intelligence 2025 VLDB 4.1945683e-05
10,723 UniClean: A Scalable Data Cleaning Solution for Mixed Errors based on Unified Cleaners and Optimized Cleaning Workflow 2025 VLDB 4.1945683e-05
10,821 Demonstrating Matelda for Multi-Table Error Detection 2025 VLDB 4.1945683e-05
11,054 Enriching Relations with Additional Attributes for ER 2024 VLDB 4.1945683e-05
11,111 Rock: Cleaning Data with both ML and Logic Rules 2024 VLDB 4.1945683e-05
11,223 Splitting Tuples of Mismatched Entities 2023 SIGMOD 4.1945683e-05
11,454 Contextual Data Cleaning with Ontology FDs 2021 SIGMOD 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 19 of 19 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
192 HoloClean: Holistic Data Repairs with Probabilistic Inference 2017 VLDB 0.00035728858
791 ActiveClean: Interactive Data Cleaning For Statistical Modeling 2016 VLDB 0.00016629664
833 Guided Data Repair 2011 VLDB 0.00016138432
881 Don’t be SCAREd: Use SCalable Automatic REpairing with Maximal Likelihood and Bounded Changes 2013 SIGMOD 0.00015661103
894 A Hybrid Approach to Functional Dependency Discovery 2016 SIGMOD 0.00015556428
1,012 NADEEF: A Commodity Data Cleaning System 2013 SIGMOD 0.0001464733
1,197 The LLUNATIC Data-Cleaning Framework 2013 VLDB 0.00013390321
1,211 Truth Finding on the Deep Web: Is the Problem Solved? 2013 VLDB 0.00013257101
1,277 The Data Civilizer System 2017 CIDR 0.00012879695
1,337 HoloDetect: Few-Shot Learning for Error Detection 2019 SIGMOD 0.00012497164
1,469 BlinkFill: Semi-supervised Programming By Example for Syntactic String Transformations 2016 VLDB 0.00011836053
1,546 KATARA: A Data Cleaning System Powered by Knowledge Bases and Crowdsourcing 2015 SIGMOD 0.00011446851
1,612 Detecting Data Errors: Where are we and what needs to be done? 2016 VLDB 0.00011142794
2,158 Uni-Detect: A Unified Approach to Automated Error Detection in Tables 2019 SIGMOD 9.4141354e-05
2,184 A Sample-and-Clean Framework for Fast and Accurate Query Processing on Dirty Data 2014 SIGMOD 9.3429789e-05
2,460 Combining Quantitative and Logical Data Cleaning 2016 VLDB 8.7617484e-05
2,638 Messing Up with BART: Error Generation for Evaluating Data-Cleaning Algorithms 2016 VLDB 8.399764e-05
2,968 Raha: A Configuration-Free Error Detection System 2019 SIGMOD 7.7985097e-05
3,451 Learning String Transformations From Examples 2009 VLDB 7.0822216e-05
Previous Page 1 / 1 Next

Semantically Similar Papers