Learning Over Dirty Data Without Cleaning
Summary: DLearn learns directly from dirty data without cleaning, bypassing data-repair bottlenecks. It leverages database constraints to infer relational models that summarize patterns across all plausible clean versions; empirical evaluation on large real-world datasets shows accuracy and efficiency. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Jose Picado
- 2. John Davis
- 3. Arash Termehchy
- 4. Ga Young Lee
Incoming Citations (Sorted by Pagerank)
Showing 4 of 4 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 4,967 | Leva: Boosting Machine Learning Performance with Relational Embedding Data Augmentation | 2022 | SIGMOD | 5.7956612e-05 |
| 9,348 | GIDCL: A Graph-Enhanced Interpretable Data Cleaning Framework with Large Language Models | 2024 | SIGMOD | 4.3526427e-05 |
| 9,856 | In-Database Data Imputation | 2024 | SIGMOD | 4.269353e-05 |
| 10,953 | Certain and Approximately Certain Models for Statistical Learning | 2024 | SIGMOD | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 20 of 20 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 4,668 | PrivateClean: Data Cleaning and Differential Privacy | 2016 | SIGMOD | 6.0115918e-05 |
| 3,396 | Automatic Data Repair: Are We Ready to Deploy? | 2024 | VLDB | 7.1455126e-05 |
| 623 | Improving Data Quality: Consistency and Accuracy | 2007 | VLDB | 0.00018996374 |
| 7,237 | CleanM: An Optimizable Query Language for Unified Scale-Out Data Cleaning | 2017 | VLDB | 4.7928651e-05 |
| 13,232 | Data Cleaning in the Era of Data Science: Challenges and Opportunities | 2021 | CIDR | - |
| 732 | Discovering Data Quality Rules | 2008 | VLDB | 0.00017465093 |
| 5,929 | ActiveClean: An Interactive Data Cleaning Framework For Modern Machine Learning | 2016 | SIGMOD | 5.2682177e-05 |
| 11,137 | Generalizable Data Cleaning of Tabular Data in Latent Space | 2024 | VLDB | 4.1945683e-05 |
| 10,512 | Auto-Test: Learning Semantic-Domain Constraints for Unsupervised Error Detection in Tables | 2025 | SIGMOD | 4.1945683e-05 |
| 1,627 | Data Cleaning: Overview and Emerging Challenges | 2016 | SIGMOD | 0.00011086905 |