| 254 |
Snorkel: Rapid Training Data Creation with Weak Supervision |
2018 |
VLDB |
0.00030540555 |
| 517 |
Can Foundation Models Wrangle Your Data? |
2023 |
VLDB |
0.00021169035 |
| 1,337 |
HoloDetect: Few-Shot Learning for Error Detection |
2019 |
SIGMOD |
0.00012497164 |
| 1,482 |
Automating Large-Scale Data Quality Verification |
2018 |
VLDB |
0.00011725533 |
| 1,894 |
Baran: Effective Error Correction via a Unified Context Representation and Transfer Learning |
2020 |
VLDB |
0.0001018378 |
| 2,122 |
SystemDS: A Declarative Machine Learning System for the End-to-End Data Science Lifecycle |
2020 |
CIDR |
9.4989076e-05 |
| 2,158 |
Uni-Detect: A Unified Approach to Automated Error Detection in Tables |
2019 |
SIGMOD |
9.4141354e-05 |
| 2,280 |
SMOKE: Fine-grained Lineage at Interactive Speed |
2018 |
VLDB |
9.1111033e-05 |
| 2,302 |
Nearest Neighbor Classifiers over Incomplete Information: From Certain Answers to Certain Predictions |
2021 |
VLDB |
9.0668832e-05 |
| 2,349 |
RPT: Relational Pre-trained Transformer Is Almost All You Need towards Democratizing Data Preparation |
2021 |
VLDB |
8.9876423e-05 |
| 2,483 |
Discovery of Approximate (and Exact) Denial Constraints |
2020 |
VLDB |
8.6864916e-05 |
| 2,506 |
Auto-Detect: Data-Driven Error Detection in Tables |
2018 |
SIGMOD |
8.6335464e-05 |
| 2,566 |
Database Repairs and Consistent Query Answering: Origins and Further Developments |
2019 |
PODS |
8.5243847e-05 |
| 2,587 |
Table-GPT: Table Fine-tuned GPT for Diverse Table Tasks |
2024 |
SIGMOD |
8.4924618e-05 |
| 2,753 |
Complaint-driven Training Data Debugging for Query 2.0 |
2020 |
SIGMOD |
8.1724339e-05 |
| 2,839 |
VolcanoML: Speeding up End-to-End AutoML via Scalable Search Space Decomposition |
2021 |
VLDB |
8.0378978e-05 |
| 2,958 |
The Role of Massively Multi-Task and Weak Supervision in Software 2.0 |
2019 |
CIDR |
7.8173975e-05 |
| 2,968 |
Raha: A Configuration-Free Error Detection System |
2019 |
SIGMOD |
7.7985097e-05 |
| 3,155 |
Ten Years of WebTables |
2018 |
VLDB |
7.4672742e-05 |
| 3,299 |
SCODED: Statistical Constraint Oriented Data Error Detection |
2020 |
SIGMOD |
7.2546659e-05 |
| 3,311 |
Efficient and Effective Data Imputation with Influence Functions |
2022 |
VLDB |
7.2406486e-05 |
| 3,396 |
Automatic Data Repair: Are We Ready to Deploy? |
2024 |
VLDB |
7.1455126e-05 |
| 3,711 |
Saga: A Platform for Continuous Construction and Serving of Knowledge At Scale |
2022 |
SIGMOD |
6.823609e-05 |
| 3,825 |
Cleanits: A Data Cleaning System for Industrial Time Series |
2019 |
VLDB |
6.7255837e-05 |
| 3,831 |
Kamino: Constraint-Aware Differentially Private Data Synthesis |
2021 |
VLDB |
6.7181688e-05 |
| 4,127 |
A Statistical Perspective on Discovering Functional Dependencies in Noisy Data |
2020 |
SIGMOD |
6.4310458e-05 |
| 4,273 |
Cleaning Denial Constraint Violations through Relaxation |
2020 |
SIGMOD |
6.3003864e-05 |
| 4,471 |
GOGGLES: Automatic Image Labeling with Affinity Coding |
2020 |
SIGMOD |
6.1555681e-05 |
| 4,607 |
Data Integration and Machine Learning: A Natural Synergy |
2018 |
SIGMOD |
6.0538827e-05 |
| 5,028 |
Adaptive Data Augmentation for Supervised Learning over Missing Data |
2021 |
VLDB |
5.7506746e-05 |
| 5,096 |
Auto-Transform: Learning-to-Transform by Patterns |
2020 |
VLDB |
5.7011825e-05 |
| 5,153 |
Horizon: Scalable Dependency-driven Data Cleaning |
2021 |
VLDB |
5.6607963e-05 |
| 5,222 |
Enabling SQL-based Training Data Debugging for Federated Learning |
2022 |
VLDB |
5.6210545e-05 |
| 5,251 |
Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale |
2019 |
SIGMOD |
5.6029615e-05 |
| 5,412 |
Mining an "Anti-Knowledge Base" from Wikipedia Updates with Applications to Fact Checking and Beyond |
2020 |
VLDB |
5.5207515e-05 |
| 5,978 |
Rotom: A Meta-Learned Data Augmentation Framework for Entity Matching, Data Cleaning, Text Classification, and Beyond |
2021 |
SIGMOD |
5.2453012e-05 |
| 6,134 |
Finding Label and Model Errors in Perception Data With Learned Observation Assertions |
2022 |
SIGMOD |
5.1943414e-05 |
| 6,187 |
Semi-Supervised Data Cleaning with Raha and Baran |
2021 |
CIDR |
5.1656857e-05 |
| 6,280 |
Self-supervised and Interpretable Data Cleaning with Sequence Generative Adversarial Networks |
2023 |
VLDB |
5.1290457e-05 |
| 6,451 |
Multivariate Time Series Cleaning under Speed Constraints |
2024 |
SIGMOD |
5.0583324e-05 |
| 6,477 |
Fast Algorithms for Denial Constraint Discovery |
2023 |
VLDB |
5.0488285e-05 |
| 6,546 |
Properties of Inconsistency Measures for Databases |
2021 |
SIGMOD |
5.0185588e-05 |
| 6,553 |
How do Categorical Duplicates Affect ML? A New Benchmark and Empirical Analyses |
2024 |
VLDB |
5.0157344e-05 |
| 6,683 |
Probabilistic Databases for All |
2020 |
PODS |
4.9638979e-05 |
| 6,690 |
Parallel Discrepancy Detection and Incremental Detection |
2021 |
VLDB |
4.9621556e-05 |
| 6,887 |
Synthesizing Linked Data Under Cardinality and Integrity Constraints |
2021 |
SIGMOD |
4.8937852e-05 |
| 6,944 |
DataPrism: Exposing Disconnect between Data and Systems |
2022 |
SIGMOD |
4.8912787e-05 |
| 7,066 |
On Multiple Semantics for Declarative Database Repairs |
2020 |
SIGMOD |
4.8445108e-05 |
| 7,223 |
Akane: Perplexity-Guided Time Series Data Cleaning |
2024 |
SIGMOD |
4.7965857e-05 |
| 7,243 |
Data Integration and Machine Learning: A Natural Synergy |
2018 |
VLDB |
4.7913666e-05 |