Self-supervised and Interpretable Data Cleaning with Sequence Generative Adversarial Networks
Summary: Garf: a SeqGAN-based, self-supervised framework that extracts interpretable conditional repair rules (e.g., city→county) directly from noisy tables. A generator plus two discriminators (D to learn dependencies, D' to iteratively refine rules/data) yields interpretable, high-accuracy cleaning without labeled data. (summarized by gpt-5-mini on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Jinfeng Peng
- 2. Derong Shen
- 3. Nan Tang
- 4. Tieying Liu
- 5. Yue Kou
- 6. Tiezheng Nie
- 7. Hang Cui
- 8. Ge Yu
Incoming Citations (Sorted by Pagerank)
Showing 6 of 6 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 8,985 | TSM-Bench: Benchmarking Time Series Database Systems for Monitoring Applications | 2023 | VLDB | 4.4156106e-05 |
| 9,348 | GIDCL: A Graph-Enhanced Interpretable Data Cleaning Framework with Large Language Models | 2024 | SIGMOD | 4.3526427e-05 |
| 9,856 | In-Database Data Imputation | 2024 | SIGMOD | 4.269353e-05 |
| 10,511 | The Best of Both Worlds: On Repairing Timestamps and Attribute Values for Multivariate Time Series | 2025 | SIGMOD | 4.1945683e-05 |
| 11,109 | SEER: An End-to-End Toolkit for Benchmarking Time Series Database Systems in Monitoring Applications | 2024 | VLDB | 4.1945683e-05 |
| 11,137 | Generalizable Data Cleaning of Tabular Data in Latent Space | 2024 | VLDB | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 25 of 25 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 9,649 | DAFDiscover: Robust Mining Algorithm for Dynamic Approximate Functional Dependencies on Dirty Data | 2024 | VLDB | 4.3109001e-05 |
| 1,627 | Data Cleaning: Overview and Emerging Challenges | 2016 | SIGMOD | 0.00011086905 |
| 5,028 | Adaptive Data Augmentation for Supervised Learning over Missing Data | 2021 | VLDB | 5.7506746e-05 |
| 9,348 | GIDCL: A Graph-Enhanced Interpretable Data Cleaning Framework with Large Language Models | 2024 | SIGMOD | 4.3526427e-05 |
| 3,192 | Towards Dependable Data Repairing with Fixing Rules | 2014 | SIGMOD | 7.4095761e-05 |
| 10,512 | Auto-Test: Learning Semantic-Domain Constraints for Unsupervised Error Detection in Tables | 2025 | SIGMOD | 4.1945683e-05 |
| 7,867 | Learning Over Dirty Data Without Cleaning | 2020 | SIGMOD | 4.6320452e-05 |
| 3,396 | Automatic Data Repair: Are We Ready to Deploy? | 2024 | VLDB | 7.1455126e-05 |
| 11,137 | Generalizable Data Cleaning of Tabular Data in Latent Space | 2024 | VLDB | 4.1945683e-05 |
| 6,187 | Semi-Supervised Data Cleaning with Raha and Baran | 2021 | CIDR | 5.1656857e-05 |