| 1,420 |
Data Management Challenges in Production Machine Learning |
2017 |
SIGMOD |
0.00012057956 |
| 1,532 |
Data Management in Machine Learning: Challenges, Techniques, and Systems |
2017 |
SIGMOD |
0.00011472681 |
| 1,894 |
Baran: Effective Error Correction via a Unified Context Representation and Transfer Learning |
2020 |
VLDB |
0.0001018378 |
| 2,302 |
Nearest Neighbor Classifiers over Incomplete Information: From Certain Answers to Certain Predictions |
2021 |
VLDB |
9.0668832e-05 |
| 2,506 |
Auto-Detect: Data-Driven Error Detection in Tables |
2018 |
SIGMOD |
8.6335464e-05 |
| 2,753 |
Complaint-driven Training Data Debugging for Query 2.0 |
2020 |
SIGMOD |
8.1724339e-05 |
| 2,839 |
VolcanoML: Speeding up End-to-End AutoML via Scalable Search Space Decomposition |
2021 |
VLDB |
8.0378978e-05 |
| 2,968 |
Raha: A Configuration-Free Error Detection System |
2019 |
SIGMOD |
7.7985097e-05 |
| 3,396 |
Automatic Data Repair: Are We Ready to Deploy? |
2024 |
VLDB |
7.1455126e-05 |
| 3,473 |
AI Meets Database: AI4DB and DB4AI |
2021 |
SIGMOD |
7.062864e-05 |
| 3,773 |
Cleaning Crowdsourced Labels Using Oracles for Statistical Classification |
2019 |
VLDB |
6.7758649e-05 |
| 4,102 |
GoodCore: Data-effective and Data-efficient Machine Learning through Coreset Selection over Incomplete Data |
2023 |
SIGMOD |
6.4522929e-05 |
| 4,273 |
Cleaning Denial Constraint Violations through Relaxation |
2020 |
SIGMOD |
6.3003864e-05 |
| 4,424 |
PrIU: A Provenance-Based Approach for Incrementally Updating Regression Models |
2020 |
SIGMOD |
6.198474e-05 |
| 4,607 |
Data Integration and Machine Learning: A Natural Synergy |
2018 |
SIGMOD |
6.0538827e-05 |
| 4,935 |
OmniFair: A Declarative System for Model-Agnostic Group Fairness in Machine Learning |
2021 |
SIGMOD |
5.8198727e-05 |
| 5,222 |
Enabling SQL-based Training Data Debugging for Federated Learning |
2022 |
VLDB |
5.6210545e-05 |
| 5,429 |
DiffPrep: Differentiable Data Preprocessing Pipeline Search for Learning over Tabular Data |
2023 |
SIGMOD |
5.5087325e-05 |
| 5,978 |
Rotom: A Meta-Learned Data Augmentation Framework for Entity Matching, Data Cleaning, Text Classification, and Beyond |
2021 |
SIGMOD |
5.2453012e-05 |
| 6,263 |
Equitable Data Valuation Meets the Right to Be Forgotten in Model Markets |
2023 |
VLDB |
5.1349507e-05 |
| 7,796 |
CHEF: A Cheap and Fast Pipeline for Iteratively Cleaning Label Uncertainties |
2021 |
VLDB |
4.6482625e-05 |
| 7,867 |
Learning Over Dirty Data Without Cleaning |
2020 |
SIGMOD |
4.6320452e-05 |
| 8,092 |
Saga: A Scalable Framework for Optimizing Data Cleaning Pipelines for Machine Learning Applications |
2023 |
SIGMOD |
4.587921e-05 |
| 8,182 |
SHiFT: An Efficient, Flexible Search Engine for Transfer Learning |
2023 |
VLDB |
4.5659133e-05 |
| 8,257 |
Automating and Optimizing Data-Centric What-If Analyses on Native Machine Learning Pipelines |
2023 |
SIGMOD |
4.5487511e-05 |
| 8,590 |
Exploratory Training: When Annotators Learn About Data |
2023 |
SIGMOD |
4.4896282e-05 |
| 8,743 |
CtxPipe: Context-aware Data Preparation Pipeline Construction for Machine Learning |
2024 |
SIGMOD |
4.456315e-05 |
| 8,840 |
The Cost of Representation by Subset Repairs |
2025 |
VLDB |
4.4388652e-05 |
| 9,043 |
Query-Guided Resolution in Uncertain Databases |
2023 |
SIGMOD |
4.4039656e-05 |
| 9,054 |
Selecting Data to Clean for Fact Checking: Minimizing Uncertainty vs. Maximizing Surprise |
2019 |
VLDB |
4.4039656e-05 |
| 9,118 |
Towards Observability for Production Machine Learning Pipelines |
2022 |
VLDB |
4.3928288e-05 |
| 9,348 |
GIDCL: A Graph-Enhanced Interpretable Data Cleaning Framework with Large Language Models |
2024 |
SIGMOD |
4.3526427e-05 |
| 9,389 |
DataVinci: Learning Syntactic and Semantic String Repairs |
2025 |
SIGMOD |
4.3441378e-05 |
| 10,026 |
Minimum Change ≠ Best Cleaning: Parallel and Incremental Error Detection under Integrity Constraints |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,029 |
Outliers: The Good, the Bad and the Ugly |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,478 |
Data Enhancement for Binary Classification of Relational Data |
2025 |
SIGMOD |
4.1945683e-05 |
| 10,528 |
Two Birds with One Stone: Efficient Deep Learning over Mislabeled Data through Subset Selection |
2025 |
SIGMOD |
4.1945683e-05 |
| 10,617 |
Deduplicated Sampling On-Demand |
2025 |
VLDB |
4.1945683e-05 |
| 10,628 |
CatDB: Data-catalog-guided, LLM-based Generation of Data-centric ML Pipelines |
2025 |
VLDB |
4.1945683e-05 |
| 10,644 |
Still More Shades of Null: An Evaluation Suite for Responsible Missing Value Imputation |
2025 |
VLDB |
4.1945683e-05 |
| 10,816 |
mlidea: Interactively Improving ML Data Preparation Code via "Shadow Pipelines" |
2025 |
VLDB |
4.1945683e-05 |
| 10,953 |
Certain and Approximately Certain Models for Statistical Learning |
2024 |
SIGMOD |
4.1945683e-05 |
| 11,052 |
Efficiently Mitigating the Impact of Data Drift on Machine Learning Pipelines |
2024 |
VLDB |
4.1945683e-05 |
| 11,137 |
Generalizable Data Cleaning of Tabular Data in Latent Space |
2024 |
VLDB |
4.1945683e-05 |
| 11,178 |
LinCQA: Faster Consistent Query Answering with Linear Time Guarantees |
2023 |
SIGMOD |
4.1945683e-05 |
| 11,431 |
Ease.ML: A Lifecycle Management System for MLDev and MLOps |
2021 |
CIDR |
4.1945683e-05 |
| 11,682 |
IHCS: An Integrated Hybrid Cleaning System |
2019 |
VLDB |
4.1945683e-05 |