| 333 |
Neo: A Learned Query Optimizer |
2019 |
VLDB |
0.00027206884 |
| 517 |
Can Foundation Models Wrangle Your Data? |
2023 |
VLDB |
0.00021169035 |
| 1,894 |
Baran: Effective Error Correction via a Unified Context Representation and Transfer Learning |
2020 |
VLDB |
0.0001018378 |
| 2,122 |
SystemDS: A Declarative Machine Learning System for the End-to-End Data Science Lifecycle |
2020 |
CIDR |
9.4989076e-05 |
| 2,753 |
Complaint-driven Training Data Debugging for Query 2.0 |
2020 |
SIGMOD |
8.1724339e-05 |
| 3,252 |
Auto-Suggest: Learning-to-Recommend Data Preparation Steps Using Data Science Notebooks |
2020 |
SIGMOD |
7.3178277e-05 |
| 3,299 |
SCODED: Statistical Constraint Oriented Data Error Detection |
2020 |
SIGMOD |
7.2546659e-05 |
| 3,396 |
Automatic Data Repair: Are We Ready to Deploy? |
2024 |
VLDB |
7.1455126e-05 |
| 3,440 |
Approximate Denial Constraints |
2020 |
VLDB |
7.0918817e-05 |
| 3,711 |
Saga: A Platform for Continuous Construction and Serving of Knowledge At Scale |
2022 |
SIGMOD |
6.823609e-05 |
| 4,127 |
A Statistical Perspective on Discovering Functional Dependencies in Noisy Data |
2020 |
SIGMOD |
6.4310458e-05 |
| 4,424 |
PrIU: A Provenance-Based Approach for Incrementally Updating Regression Models |
2020 |
SIGMOD |
6.198474e-05 |
| 5,096 |
Auto-Transform: Learning-to-Transform by Patterns |
2020 |
VLDB |
5.7011825e-05 |
| 5,222 |
Enabling SQL-based Training Data Debugging for Federated Learning |
2022 |
VLDB |
5.6210545e-05 |
| 5,978 |
Rotom: A Meta-Learned Data Augmentation Framework for Entity Matching, Data Cleaning, Text Classification, and Beyond |
2021 |
SIGMOD |
5.2453012e-05 |
| 6,077 |
The Fast and the Private: Task-based Dataset Search |
2024 |
CIDR |
5.2229324e-05 |
| 6,134 |
Finding Label and Model Errors in Perception Data With Learned Observation Assertions |
2022 |
SIGMOD |
5.1943414e-05 |
| 6,187 |
Semi-Supervised Data Cleaning with Raha and Baran |
2021 |
CIDR |
5.1656857e-05 |
| 6,690 |
Parallel Discrepancy Detection and Incremental Detection |
2021 |
VLDB |
4.9621556e-05 |
| 7,704 |
ExDRa: Exploratory Data Science on Federated Raw Data |
2021 |
SIGMOD |
4.6733838e-05 |
| 7,838 |
Auto-Validate: Unsupervised Data Validation Using Data-Domain Patterns Inferred from Data Lakes |
2021 |
SIGMOD |
4.6377995e-05 |
| 8,092 |
Saga: A Scalable Framework for Optimizing Data Cleaning Pipelines for Machine Learning Applications |
2023 |
SIGMOD |
4.587921e-05 |
| 8,257 |
Automating and Optimizing Data-Centric What-If Analyses on Native Machine Learning Pipelines |
2023 |
SIGMOD |
4.5487511e-05 |
| 8,590 |
Exploratory Training: When Annotators Learn About Data |
2023 |
SIGMOD |
4.4896282e-05 |
| 9,077 |
VerifAI: Verified Generative AI |
2024 |
CIDR |
4.4010762e-05 |
| 9,348 |
GIDCL: A Graph-Enhanced Interpretable Data Cleaning Framework with Large Language Models |
2024 |
SIGMOD |
4.3526427e-05 |
| 9,355 |
Discovering Top-k Rules using Subjective and Objective Criteria |
2023 |
SIGMOD |
4.3514328e-05 |
| 9,389 |
DataVinci: Learning Syntactic and Semantic String Repairs |
2025 |
SIGMOD |
4.3441378e-05 |
| 9,434 |
Rock: Cleaning Data by Embedding ML in Logic Rules |
2024 |
SIGMOD |
4.3430376e-05 |
| 9,487 |
Making It Tractable to Catch Duplicates and Conflicts in Graphs |
2023 |
SIGMOD |
4.3341665e-05 |
| 9,560 |
MTSClean: Efficient Constraint-based Cleaning for Multi-Dimensional Time Series Data |
2024 |
VLDB |
4.3254416e-05 |
| 9,777 |
Data Augmentation for ML-driven Data Preparation and Integration |
2021 |
VLDB |
4.2856106e-05 |
| 9,849 |
Reptile: Aggregation-level Explanations for Hierarchical Data |
2022 |
SIGMOD |
4.2721228e-05 |
| 9,963 |
Parallel Rule Discovery from Large Datasets by Sampling |
2022 |
SIGMOD |
4.2294678e-05 |
| 9,984 |
Towards Scalable Visual Data Wrangling via Direct Manipulation |
2026 |
CIDR |
4.1945683e-05 |
| 10,026 |
Minimum Change ≠ Best Cleaning: Parallel and Incremental Error Detection under Integrity Constraints |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,598 |
Auto-Prep: Holistic Prediction of Data Preparation Steps for Self-Service Business Intelligence |
2025 |
VLDB |
4.1945683e-05 |
| 10,712 |
DobLIX: A Dual-Objective Learned Index for Log-Structured Merge Trees |
2025 |
VLDB |
4.1945683e-05 |
| 10,723 |
UniClean: A Scalable Data Cleaning Solution for Mixed Errors based on Unified Cleaners and Optimized Cleaning Workflow |
2025 |
VLDB |
4.1945683e-05 |
| 10,821 |
Demonstrating Matelda for Multi-Table Error Detection |
2025 |
VLDB |
4.1945683e-05 |
| 11,000 |
MisDetect: Iterative Mislabel Detection using Early Loss |
2024 |
VLDB |
4.1945683e-05 |
| 11,137 |
Generalizable Data Cleaning of Tabular Data in Latent Space |
2024 |
VLDB |
4.1945683e-05 |
| 11,369 |
PGE: Robust Product Graph Embedding Learning for Error Detection |
2022 |
VLDB |
4.1945683e-05 |