| 192 |
HoloClean: Holistic Data Repairs with Probabilistic Inference |
2017 |
VLDB |
0.00035728858 |
| 517 |
Can Foundation Models Wrangle Your Data? |
2023 |
VLDB |
0.00021169035 |
| 1,277 |
The Data Civilizer System |
2017 |
CIDR |
0.00012879695 |
| 1,337 |
HoloDetect: Few-Shot Learning for Error Detection |
2019 |
SIGMOD |
0.00012497164 |
| 1,831 |
Synthesizing Entity Matching Rules by Examples |
2018 |
VLDB |
0.00010384082 |
| 1,894 |
Baran: Effective Error Correction via a Unified Context Representation and Transfer Learning |
2020 |
VLDB |
0.0001018378 |
| 1,914 |
Creating Embeddings of Heterogeneous Relational Datasets for Data Integration Tasks |
2020 |
SIGMOD |
0.00010109102 |
| 2,158 |
Uni-Detect: A Unified Approach to Automated Error Detection in Tables |
2019 |
SIGMOD |
9.4141354e-05 |
| 2,349 |
RPT: Relational Pre-trained Transformer Is Almost All You Need towards Democratizing Data Preparation |
2021 |
VLDB |
8.9876423e-05 |
| 2,506 |
Auto-Detect: Data-Driven Error Detection in Tables |
2018 |
SIGMOD |
8.6335464e-05 |
| 2,753 |
Complaint-driven Training Data Debugging for Query 2.0 |
2020 |
SIGMOD |
8.1724339e-05 |
| 2,968 |
Raha: A Configuration-Free Error Detection System |
2019 |
SIGMOD |
7.7985097e-05 |
| 3,252 |
Auto-Suggest: Learning-to-Recommend Data Preparation Steps Using Data Science Notebooks |
2020 |
SIGMOD |
7.3178277e-05 |
| 3,299 |
SCODED: Statistical Constraint Oriented Data Error Detection |
2020 |
SIGMOD |
7.2546659e-05 |
| 3,396 |
Automatic Data Repair: Are We Ready to Deploy? |
2024 |
VLDB |
7.1455126e-05 |
| 3,976 |
UGuide – User-Guided Discovery of FD-Detectable Errors |
2017 |
SIGMOD |
6.5736462e-05 |
| 5,028 |
Adaptive Data Augmentation for Supervised Learning over Missing Data |
2021 |
VLDB |
5.7506746e-05 |
| 5,096 |
Auto-Transform: Learning-to-Transform by Patterns |
2020 |
VLDB |
5.7011825e-05 |
| 5,192 |
Pattern Functional Dependencies for Data Cleaning |
2020 |
VLDB |
5.6375087e-05 |
| 5,429 |
DiffPrep: Differentiable Data Preprocessing Pipeline Search for Learning over Tabular Data |
2023 |
SIGMOD |
5.5087325e-05 |
| 5,445 |
QFix: Diagnosing Errors through Query Histories |
2017 |
SIGMOD |
5.5020909e-05 |
| 5,928 |
SchemaPile: A Large Collection of Relational Database Schemas |
2024 |
SIGMOD |
5.2685946e-05 |
| 6,187 |
Semi-Supervised Data Cleaning with Raha and Baran |
2021 |
CIDR |
5.1656857e-05 |
| 6,280 |
Self-supervised and Interpretable Data Cleaning with Sequence Generative Adversarial Networks |
2023 |
VLDB |
5.1290457e-05 |
| 6,295 |
Your notebook is not crumby enough, REPLace it |
2020 |
CIDR |
5.1249204e-05 |
| 6,546 |
Properties of Inconsistency Measures for Databases |
2021 |
SIGMOD |
5.0185588e-05 |
| 7,391 |
Time Series Data Validity |
2023 |
SIGMOD |
4.7429293e-05 |
| 7,564 |
PIClean: A Probabilistic and Interactive Data Cleaning System |
2019 |
SIGMOD |
4.7093702e-05 |
| 8,208 |
SMARTFEAT: Efficient Feature Construction through Feature-Level Foundation Model Interactions |
2024 |
CIDR |
4.5581306e-05 |
| 8,472 |
Rapidash: Efficient Detection of Constraint Violations |
2024 |
VLDB |
4.5036378e-05 |
| 8,590 |
Exploratory Training: When Annotators Learn About Data |
2023 |
SIGMOD |
4.4896282e-05 |
| 8,678 |
Progressive Deep Web Crawling Through Keyword Queries For Data Enrichment |
2019 |
SIGMOD |
4.4702119e-05 |
| 8,743 |
CtxPipe: Context-aware Data Preparation Pipeline Construction for Machine Learning |
2024 |
SIGMOD |
4.456315e-05 |
| 9,056 |
A Data Quality Metric (DQM): How to Estimate the Number of Undetected Errors in Data Sets |
2017 |
VLDB |
4.4039656e-05 |
| 9,077 |
VerifAI: Verified Generative AI |
2024 |
CIDR |
4.4010762e-05 |
| 9,118 |
Towards Observability for Production Machine Learning Pipelines |
2022 |
VLDB |
4.3928288e-05 |
| 9,306 |
Debugging Large-Scale Data Science Pipelines using Dagger |
2020 |
VLDB |
4.3572942e-05 |
| 9,348 |
GIDCL: A Graph-Enhanced Interpretable Data Cleaning Framework with Large Language Models |
2024 |
SIGMOD |
4.3526427e-05 |
| 9,479 |
Data Imputation with Limited Data Redundancy Using Data Lakes |
2025 |
VLDB |
4.3341665e-05 |
| 9,577 |
CoClean: Collaborative Data Cleaning |
2020 |
SIGMOD |
4.3248438e-05 |
| 9,856 |
In-Database Data Imputation |
2024 |
SIGMOD |
4.269353e-05 |
| 9,928 |
Fainder: A Fast and Accurate Index for Distribution-Aware Dataset Search |
2024 |
VLDB |
4.2511622e-05 |
| 9,984 |
Towards Scalable Visual Data Wrangling via Direct Manipulation |
2026 |
CIDR |
4.1945683e-05 |
| 10,026 |
Minimum Change ≠ Best Cleaning: Parallel and Incremental Error Detection under Integrity Constraints |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,463 |
Zorro: Quantifying Uncertainty in Models & Predictions Arising from Dirty Data |
2025 |
SIGMOD |
4.1945683e-05 |
| 10,723 |
UniClean: A Scalable Data Cleaning Solution for Mixed Errors based on Unified Cleaners and Optimized Cleaning Workflow |
2025 |
VLDB |
4.1945683e-05 |
| 10,821 |
Demonstrating Matelda for Multi-Table Error Detection |
2025 |
VLDB |
4.1945683e-05 |
| 11,216 |
Demystifying the QoS and QoE of Edge-hosted Video Streaming Applications in the Wild with SNESet |
2023 |
SIGMOD |
4.1945683e-05 |
| 11,529 |
GEDet: Detecting Erroneous Nodes with A Few Examples |
2021 |
VLDB |
4.1945683e-05 |