DataVinci: Learning Syntactic and Semantic String Repairs
Summary: DataVinci is a fully unsupervised string error detector and repairer that learns column-wide regex patterns to flag deviations. It handles mixed syntactic and semantic substrings with an LLM-based abstraction and uses data-program traces to derive repairs. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Mukul Singh
- 2. José Cambronero
- 3. Sumit Gulwani
- 4. Vu Le
- 5. Carina Negreanu
- 6. Arjun Radhakrishna
- 7. Gust Verbruggen
Incoming Citations (Sorted by Pagerank)
Showing 1 of 1 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 11,137 | Generalizable Data Cleaning of Tabular Data in Latent Space | 2024 | VLDB | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 12 of 12 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 192 | HoloClean: Holistic Data Repairs with Probabilistic Inference | 2017 | VLDB | 0.00035728858 |
| 7,867 | Learning Over Dirty Data Without Cleaning | 2020 | SIGMOD | 4.6320452e-05 |
| 1,159 | Towards Certain Fixes with Editing Rules and Master Data | 2010 | VLDB | 0.00013592813 |
| 2,158 | Uni-Detect: A Unified Approach to Automated Error Detection in Tables | 2019 | SIGMOD | 9.4141354e-05 |
| 10,811 | DemandClean: A Multi-Objective Learning Framework for Balancing Model Tolerance to Data Authenticity and Diversity | 2025 | VLDB | 4.1945683e-05 |
| 3,396 | Automatic Data Repair: Are We Ready to Deploy? | 2024 | VLDB | 7.1455126e-05 |
| 3,192 | Towards Dependable Data Repairing with Fixing Rules | 2014 | SIGMOD | 7.4095761e-05 |
| 2,506 | Auto-Detect: Data-Driven Error Detection in Tables | 2018 | SIGMOD | 8.6335464e-05 |
| 3,230 | Learning Semantic String Transformations from Examples | 2012 | VLDB | 7.339123e-05 |
| 10,512 | Auto-Test: Learning Semantic-Domain Constraints for Unsupervised Error Detection in Tables | 2025 | SIGMOD | 4.1945683e-05 |