Back to papers
Horizon: Scalable Dependency-driven Data Cleaning
Summary: Horizon: scalable, dependency-driven FD repair. An end-to-end data-cleaning system that preserves the most frequent data patterns to boost accuracy, while delivering a linear-time repair algorithm that scales to millions of records and outperforms prior cleaners.
(summarized by gpt-5-nano on Feb 09 2026)
- Paper ID
- 12428
- Venue
- VLDB
- Year
- 2021
- Pagerank
- 5.6607963e-05
- Overall Rank
- 5,153 | 64.16%
- DOI
-
10.14778/3476249.3476301
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 16 of 16 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
| 3,396 |
Automatic Data Repair: Are We Ready to Deploy? |
2024 |
VLDB |
7.1455126e-05 |
| 6,280 |
Self-supervised and Interpretable Data Cleaning with Sequence Generative Adversarial Networks |
2023 |
VLDB |
5.1290457e-05 |
| 6,944 |
DataPrism: Exposing Disconnect between Data and Systems |
2022 |
SIGMOD |
4.8912787e-05 |
| 8,745 |
Sparcle: Boosting the Accuracy of Data Cleaning Systems through Spatial Awareness |
2024 |
VLDB |
4.456315e-05 |
| 9,043 |
Query-Guided Resolution in Uncertain Databases |
2023 |
SIGMOD |
4.4039656e-05 |
| 9,348 |
GIDCL: A Graph-Enhanced Interpretable Data Cleaning Framework with Large Language Models |
2024 |
SIGMOD |
4.3526427e-05 |
| 9,434 |
Rock: Cleaning Data by Embedding ML in Logic Rules |
2024 |
SIGMOD |
4.3430376e-05 |
| 9,856 |
In-Database Data Imputation |
2024 |
SIGMOD |
4.269353e-05 |
| 9,984 |
Towards Scalable Visual Data Wrangling via Direct Manipulation |
2026 |
CIDR |
4.1945683e-05 |
| 10,026 |
Minimum Change ≠ Best Cleaning: Parallel and Incremental Error Detection under Integrity Constraints |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,213 |
Stress-Testing Causal Claims via Cardinality Repairs |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,723 |
UniClean: A Scalable Data Cleaning Solution for Mixed Errors based on Unified Cleaners and Optimized Cleaning Workflow |
2025 |
VLDB |
4.1945683e-05 |
| 11,069 |
Hardware-Efficient Data Imputation through DBMS Extensibility |
2024 |
VLDB |
4.1945683e-05 |
| 11,137 |
Generalizable Data Cleaning of Tabular Data in Latent Space |
2024 |
VLDB |
4.1945683e-05 |
| 11,178 |
LinCQA: Faster Consistent Query Answering with Linear Time Guarantees |
2023 |
SIGMOD |
4.1945683e-05 |
| 11,223 |
Splitting Tuples of Mismatched Entities |
2023 |
SIGMOD |
4.1945683e-05 |
Outgoing Citations (Sorted by Pagerank)
Showing 16 of 16 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 13 |
Mining Association Rules between Sets of Items in Large Databases |
1993 |
SIGMOD |
0.0010864752 |
| 192 |
HoloClean: Holistic Data Repairs with Probabilistic Inference |
2017 |
VLDB |
0.00035728858 |
| 265 |
A Cost-Based Model and Effective Heuristic for Repairing Constraints by Value Modification |
2005 |
SIGMOD |
0.00029763412 |
| 623 |
Improving Data Quality: Consistency and Accuracy |
2007 |
VLDB |
0.00018996374 |
| 881 |
Don’t be SCAREd: Use SCalable Automatic REpairing with Maximal Likelihood and Bounded Changes |
2013 |
SIGMOD |
0.00015661103 |
| 1,012 |
NADEEF: A Commodity Data Cleaning System |
2013 |
SIGMOD |
0.0001464733 |
| 1,188 |
On Generating Near-Optimal Tableaux for Conditional Functional Dependencies |
2008 |
VLDB |
0.00013441729 |
| 1,197 |
The LLUNATIC Data-Cleaning Framework |
2013 |
VLDB |
0.00013390321 |
| 1,277 |
The Data Civilizer System |
2017 |
CIDR |
0.00012879695 |
| 1,624 |
Sampling the Repairs of Functional Dependency Violations under Hard Constraints |
2010 |
VLDB |
0.00011099222 |
| 1,894 |
Baran: Effective Error Correction via a Unified Context Representation and Transfer Learning |
2020 |
VLDB |
0.0001018378 |
| 2,184 |
A Sample-and-Clean Framework for Fast and Accurate Query Processing on Dirty Data |
2014 |
SIGMOD |
9.3429789e-05 |
| 2,266 |
Estimating the Confidence of Conditional Functional Dependencies |
2009 |
SIGMOD |
9.1540815e-05 |
| 2,638 |
Messing Up with BART: Error Generation for Evaluating Data-Cleaning Algorithms |
2016 |
VLDB |
8.399764e-05 |
| 5,684 |
Dagger: A Data (not code) Debugger |
2020 |
CIDR |
5.3720749e-05 |
| 5,729 |
KATARA: Reliable Data Cleaning with Knowledge Bases and Crowdsourcing |
2015 |
VLDB |
5.3506368e-05 |
Semantically Similar Papers