Database Paper Browser

Back to papers

Messing Up with BART: Error Generation for Evaluating Data-Cleaning Algorithms

Summary: Error-generation for benchmarking data-cleaning; enables precise control and large-scale testing. NP-complete; a scalable greedy algorithm sacrifices completeness, aided by a symmetry property of data-quality constraints, exposing the control–scalability trade-off. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
11327
Venue
VLDB
Year
2016
Pagerank
8.399764e-05
Overall Rank
2,638 | 81.65%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 23 of 23 citing papers.

Rank Citing Paper Year Venue Pagerank
1,337 HoloDetect: Few-Shot Learning for Error Detection 2019 SIGMOD 0.00012497164
1,612 Detecting Data Errors: Where are we and what needs to be done? 2016 VLDB 0.00011142794
1,894 Baran: Effective Error Correction via a Unified Context Representation and Transfer Learning 2020 VLDB 0.0001018378
1,914 Creating Embeddings of Heterogeneous Relational Datasets for Data Integration Tasks 2020 SIGMOD 0.00010109102
2,968 Raha: A Configuration-Free Error Detection System 2019 SIGMOD 7.7985097e-05
3,396 Automatic Data Repair: Are We Ready to Deploy? 2024 VLDB 7.1455126e-05
3,976 UGuide – User-Guided Discovery of FD-Detectable Errors 2017 SIGMOD 6.5736462e-05
4,273 Cleaning Denial Constraint Violations through Relaxation 2020 SIGMOD 6.3003864e-05
5,153 Horizon: Scalable Dependency-driven Data Cleaning 2021 VLDB 5.6607963e-05
5,618 Explaining Repaired Data with CFDs 2018 VLDB 5.4079415e-05
6,475 Explain3D: Explaining Disagreements in Disjoint Datasets 2019 VLDB 5.0497183e-05
6,739 Benchmarking Approximate Consistent Query Answering 2021 PODS 4.9449088e-05
8,149 Why Not Match: On Explanations of Event Pattern Queries 2021 SIGMOD 4.5752863e-05
8,590 Exploratory Training: When Annotators Learn About Data 2023 SIGMOD 4.4896282e-05
9,278 Interactive and Deterministic Data Cleaning: A Tossed Stone Raises a Thousand Ripples 2016 SIGMOD 4.3639892e-05
9,348 GIDCL: A Graph-Enhanced Interpretable Data Cleaning Framework with Large Language Models 2024 SIGMOD 4.3526427e-05
9,649 DAFDiscover: Robust Mining Algorithm for Dynamic Approximate Functional Dependencies on Dirty Data 2024 VLDB 4.3109001e-05
10,019 Guardrail: Automated Integrity Constraint Synthesis From Noisy Data 2026 SIGMOD 4.1945683e-05
10,026 Minimum Change ≠ Best Cleaning: Parallel and Incremental Error Detection under Integrity Constraints 2026 SIGMOD 4.1945683e-05
10,723 UniClean: A Scalable Data Cleaning Solution for Mixed Errors based on Unified Cleaners and Optimized Cleaning Workflow 2025 VLDB 4.1945683e-05
11,178 LinCQA: Faster Consistent Query Answering with Linear Time Guarantees 2023 SIGMOD 4.1945683e-05
11,388 Frost: A Platform for Benchmarking and Exploring Data Matching Results 2022 VLDB 4.1945683e-05
11,841 BART in Action: Error Generation and Empirical Evaluations of Data-Cleaning Systems 2016 SIGMOD 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 13 of 13 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers