KATARA: Reliable Data Cleaning with Knowledge Bases and Crowdsourcing
Summary: Katara: end-to-end data cleaning powered by knowledge bases and crowdsourcing for reliable repairs. Interprets table semantics against a KB, flags correctness, and outputs top-k repairs; adds browser-based setup, pattern validation, data annotation, and repair-status visualization. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Xu Chu
- 2. Mourad Ouzzani
- 3. John Morcos
- 4. Paolo Papotti
- 5. Ihab F. Ilyas
- 6. Nan Tang
- 7. Yin Ye
Incoming Citations (Sorted by Pagerank)
Showing 8 of 8 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 5,153 | Horizon: Scalable Dependency-driven Data Cleaning | 2021 | VLDB | 5.6607963e-05 |
| 6,187 | Semi-Supervised Data Cleaning with Raha and Baran | 2021 | CIDR | 5.1656857e-05 |
| 6,280 | Self-supervised and Interpretable Data Cleaning with Sequence Generative Adversarial Networks | 2023 | VLDB | 5.1290457e-05 |
| 7,237 | CleanM: An Optimizable Query Language for Unified Scale-Out Data Cleaning | 2017 | VLDB | 4.7928651e-05 |
| 8,092 | Saga: A Scalable Framework for Optimizing Data Cleaning Pipelines for Machine Learning Applications | 2023 | SIGMOD | 4.587921e-05 |
| 9,849 | Reptile: Aggregation-level Explanations for Hierarchical Data | 2022 | SIGMOD | 4.2721228e-05 |
| 10,821 | Demonstrating Matelda for Multi-Table Error Detection | 2025 | VLDB | 4.1945683e-05 |
| 11,731 | A Demonstration of PERC: Probabilistic Entity Resolution With Crowd Errors | 2018 | VLDB | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 9 of 9 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 364 | Annotating and Searching Web Tables Using Entities, Types and Relationships | 2010 | VLDB | 0.00025637562 |
| 656 | ERACER: A Database Approach for Statistical Inference and Data Cleaning | 2010 | SIGMOD | 0.00018588729 |
| 833 | Guided Data Repair | 2011 | VLDB | 0.00016138432 |
| 881 | Don’t be SCAREd: Use SCalable Automatic REpairing with Maximal Likelihood and Bounded Changes | 2013 | SIGMOD | 0.00015661103 |
| 1,159 | Towards Certain Fixes with Editing Rules and Master Data | 2010 | VLDB | 0.00013592813 |
| 1,546 | KATARA: A Data Cleaning System Powered by Knowledge Bases and Crowdsourcing | 2015 | SIGMOD | 0.00011446851 |
| 2,823 | Interaction between Record Matching and Data Repairing | 2011 | SIGMOD | 8.0593894e-05 |
| 2,847 | Building, Maintaining, and Using Knowledge Bases: A Report from the Trenches | 2013 | SIGMOD | 8.0224023e-05 |
| 2,946 | BigDansing: A System for Big Data Cleansing | 2015 | SIGMOD | 7.8372441e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 199 | Declarative Data Cleaning: Language, Model, and Algorithms | 2001 | VLDB | 0.00035041015 |
| 4,416 | CrowdMatcher: Crowd-Assisted Schema Matching | 2014 | SIGMOD | 6.2039225e-05 |
| 7,564 | PIClean: A Probabilistic and Interactive Data Cleaning System | 2019 | SIGMOD | 4.7093702e-05 |
| 2,888 | Sato: Contextual Semantic Type Detection in Tables | 2020 | VLDB | 7.9594996e-05 |
| 9,278 | Interactive and Deterministic Data Cleaning: A Tossed Stone Raises a Thousand Ripples | 2016 | SIGMOD | 4.3639892e-05 |
| 489 | Data Curation at Scale: The Data Tamer System | 2013 | CIDR | 0.00022030728 |
| 10,512 | Auto-Test: Learning Semantic-Domain Constraints for Unsupervised Error Detection in Tables | 2025 | SIGMOD | 4.1945683e-05 |
| 6,187 | Semi-Supervised Data Cleaning with Raha and Baran | 2021 | CIDR | 5.1656857e-05 |
| 10,821 | Demonstrating Matelda for Multi-Table Error Detection | 2025 | VLDB | 4.1945683e-05 |
| 1,546 | KATARA: A Data Cleaning System Powered by Knowledge Bases and Crowdsourcing | 2015 | SIGMOD | 0.00011446851 |