Database Paper Browser

Back to papers

Horizon: Scalable Dependency-driven Data Cleaning

Summary: Horizon: scalable, dependency-driven FD repair. An end-to-end data-cleaning system that preserves the most frequent data patterns to boost accuracy, while delivering a linear-time repair algorithm that scales to millions of records and outperforms prior cleaners. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
12428
Venue
VLDB
Year
2021
Pagerank
5.6607963e-05
Overall Rank
5,153 | 64.16%
DOI
10.14778/3476249.3476301

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 16 of 16 citing papers.

Rank Citing Paper Year Venue Pagerank
3,396 Automatic Data Repair: Are We Ready to Deploy? 2024 VLDB 7.1455126e-05
6,280 Self-supervised and Interpretable Data Cleaning with Sequence Generative Adversarial Networks 2023 VLDB 5.1290457e-05
6,944 DataPrism: Exposing Disconnect between Data and Systems 2022 SIGMOD 4.8912787e-05
8,745 Sparcle: Boosting the Accuracy of Data Cleaning Systems through Spatial Awareness 2024 VLDB 4.456315e-05
9,043 Query-Guided Resolution in Uncertain Databases 2023 SIGMOD 4.4039656e-05
9,348 GIDCL: A Graph-Enhanced Interpretable Data Cleaning Framework with Large Language Models 2024 SIGMOD 4.3526427e-05
9,434 Rock: Cleaning Data by Embedding ML in Logic Rules 2024 SIGMOD 4.3430376e-05
9,856 In-Database Data Imputation 2024 SIGMOD 4.269353e-05
9,984 Towards Scalable Visual Data Wrangling via Direct Manipulation 2026 CIDR 4.1945683e-05
10,026 Minimum Change ≠ Best Cleaning: Parallel and Incremental Error Detection under Integrity Constraints 2026 SIGMOD 4.1945683e-05
10,213 Stress-Testing Causal Claims via Cardinality Repairs 2026 SIGMOD 4.1945683e-05
10,723 UniClean: A Scalable Data Cleaning Solution for Mixed Errors based on Unified Cleaners and Optimized Cleaning Workflow 2025 VLDB 4.1945683e-05
11,069 Hardware-Efficient Data Imputation through DBMS Extensibility 2024 VLDB 4.1945683e-05
11,137 Generalizable Data Cleaning of Tabular Data in Latent Space 2024 VLDB 4.1945683e-05
11,178 LinCQA: Faster Consistent Query Answering with Linear Time Guarantees 2023 SIGMOD 4.1945683e-05
11,223 Splitting Tuples of Mismatched Entities 2023 SIGMOD 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 16 of 16 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
13 Mining Association Rules between Sets of Items in Large Databases 1993 SIGMOD 0.0010864752
192 HoloClean: Holistic Data Repairs with Probabilistic Inference 2017 VLDB 0.00035728858
265 A Cost-Based Model and Effective Heuristic for Repairing Constraints by Value Modification 2005 SIGMOD 0.00029763412
623 Improving Data Quality: Consistency and Accuracy 2007 VLDB 0.00018996374
881 Don’t be SCAREd: Use SCalable Automatic REpairing with Maximal Likelihood and Bounded Changes 2013 SIGMOD 0.00015661103
1,012 NADEEF: A Commodity Data Cleaning System 2013 SIGMOD 0.0001464733
1,188 On Generating Near-Optimal Tableaux for Conditional Functional Dependencies 2008 VLDB 0.00013441729
1,197 The LLUNATIC Data-Cleaning Framework 2013 VLDB 0.00013390321
1,277 The Data Civilizer System 2017 CIDR 0.00012879695
1,624 Sampling the Repairs of Functional Dependency Violations under Hard Constraints 2010 VLDB 0.00011099222
1,894 Baran: Effective Error Correction via a Unified Context Representation and Transfer Learning 2020 VLDB 0.0001018378
2,184 A Sample-and-Clean Framework for Fast and Accurate Query Processing on Dirty Data 2014 SIGMOD 9.3429789e-05
2,266 Estimating the Confidence of Conditional Functional Dependencies 2009 SIGMOD 9.1540815e-05
2,638 Messing Up with BART: Error Generation for Evaluating Data-Cleaning Algorithms 2016 VLDB 8.399764e-05
5,684 Dagger: A Data (not code) Debugger 2020 CIDR 5.3720749e-05
5,729 KATARA: Reliable Data Cleaning with Knowledge Bases and Crowdsourcing 2015 VLDB 5.3506368e-05
Previous Page 1 / 1 Next

Semantically Similar Papers