| 112 |
Potter's Wheel: An Interactive Data Cleaning System |
2001 |
VLDB |
0.00047045036 |
| 192 |
HoloClean: Holistic Data Repairs with Probabilistic Inference |
2017 |
VLDB |
0.00035728858 |
| 489 |
Data Curation at Scale: The Data Tamer System |
2013 |
CIDR |
0.00022030728 |
| 656 |
ERACER: A Database Approach for Statistical Inference and Data Cleaning |
2010 |
SIGMOD |
0.00018588729 |
| 683 |
Cerebro: A Data System for Optimized Deep Learning Model Selection |
2020 |
VLDB |
0.00018195476 |
| 791 |
ActiveClean: Interactive Data Cleaning For Statistical Modeling |
2016 |
VLDB |
0.00016629664 |
| 833 |
Guided Data Repair |
2011 |
VLDB |
0.00016138432 |
| 881 |
Don’t be SCAREd: Use SCalable Automatic REpairing with Maximal Likelihood and Bounded Changes |
2013 |
SIGMOD |
0.00015661103 |
| 921 |
Democratizing Data Science through Interactive Curation of ML Pipelines |
2019 |
SIGMOD |
0.00015337438 |
| 1,012 |
NADEEF: A Commodity Data Cleaning System |
2013 |
SIGMOD |
0.0001464733 |
| 1,078 |
Model Management 2.0: Manipulating Richer Mappings |
2007 |
SIGMOD |
0.00014245848 |
| 1,277 |
The Data Civilizer System |
2017 |
CIDR |
0.00012879695 |
| 1,337 |
HoloDetect: Few-Shot Learning for Error Detection |
2019 |
SIGMOD |
0.00012497164 |
| 1,391 |
Ease.ml: Towards Multi-tenant Resource Sharing for Machine Learning Workloads |
2018 |
VLDB |
0.0001223506 |
| 1,402 |
Hybrid Parallelization Strategies for Large-Scale Machine Learning in SystemML |
2014 |
VLDB |
0.00012180605 |
| 1,420 |
Data Management Challenges in Production Machine Learning |
2017 |
SIGMOD |
0.00012057956 |
| 1,482 |
Automating Large-Scale Data Quality Verification |
2018 |
VLDB |
0.00011725533 |
| 1,527 |
Generic Schema Matching, Ten Years Later |
2011 |
VLDB |
0.00011499442 |
| 1,666 |
HELIX: Holistic Optimization for Accelerating Iterative Machine Learning |
2019 |
VLDB |
0.0001096361 |
| 1,894 |
Baran: Effective Error Correction via a Unified Context Representation and Transfer Learning |
2020 |
VLDB |
0.0001018378 |
| 1,940 |
SliceLine: Fast, Linear-Algebra-based Slice Finding for ML Model Debugging |
2021 |
SIGMOD |
0.00010020173 |
| 2,122 |
SystemDS: A Declarative Machine Learning System for the End-to-End Data Science Lifecycle |
2020 |
CIDR |
9.4989076e-05 |
| 2,302 |
Nearest Neighbor Classifiers over Incomplete Information: From Certain Answers to Certain Predictions |
2021 |
VLDB |
9.0668832e-05 |
| 2,573 |
Query Optimization for Dynamic Imputation |
2017 |
VLDB |
8.518235e-05 |
| 2,946 |
BigDansing: A System for Big Data Cleansing |
2015 |
SIGMOD |
7.8372441e-05 |
| 2,968 |
Raha: A Configuration-Free Error Detection System |
2019 |
SIGMOD |
7.7985097e-05 |
| 3,133 |
Time Series Data Cleaning: From Anomaly Detection to Anomaly Repairing |
2017 |
VLDB |
7.4978041e-05 |
| 3,491 |
TensorFlow Data Validation: Data Analysis and Validation in Continuous ML Pipelines |
2020 |
SIGMOD |
7.0451276e-05 |
| 3,528 |
Distributed Data Deduplication |
2016 |
VLDB |
7.0066139e-05 |
| 4,110 |
Learning to Validate the Predictions of Black Box Classifiers on Unseen Data |
2020 |
SIGMOD |
6.4428544e-05 |
| 4,464 |
Magellan: Toward Building Entity Matching Management Systems over Data Science Stacks |
2016 |
VLDB |
6.1606042e-05 |
| 4,749 |
Slice Tuner: A Selective Data Acquisition Framework for Accurate and Fair Machine Learning Models |
2021 |
SIGMOD |
5.9503689e-05 |
| 4,769 |
Automated Feature Engineering for Algorithmic Fairness |
2021 |
VLDB |
5.934329e-05 |
| 4,774 |
LIMA: Fine-grained Lineage Tracing and Reuse in Machine Learning Systems |
2021 |
SIGMOD |
5.9316087e-05 |
| 4,989 |
BEER: Blocking for Effective Entity Resolution |
2021 |
SIGMOD |
5.7827362e-05 |
| 5,050 |
xPAD: A Platform for Analytic Data Flows |
2013 |
SIGMOD |
5.7340229e-05 |
| 5,729 |
KATARA: Reliable Data Cleaning with Knowledge Bases and Crowdsourcing |
2015 |
VLDB |
5.3506368e-05 |
| 5,806 |
BlinkML: Efficient Maximum Likelihood Estimation with Probabilistic Guarantees |
2019 |
SIGMOD |
5.3200643e-05 |
| 6,102 |
QoX-Driven ETL Design: Reducing the Cost of ETL Consulting Engagements |
2009 |
SIGMOD |
5.2087887e-05 |
| 6,993 |
Unit Testing Data with Deequ |
2019 |
SIGMOD |
4.8693227e-05 |
| 7,450 |
SystemER: A Human-in-the-loop System for Explainable Entity Resolution |
2019 |
VLDB |
4.7265276e-05 |
| 9,001 |
The Power of Nested Parallelism in Big Data Processing – Hitting Three Flies with One Slap – |
2021 |
SIGMOD |
4.4107627e-05 |
| 9,927 |
AlphaEvolve: A Learning Framework to Discover Novel Alphas in Quantitative Investment |
2021 |
SIGMOD |
4.2532819e-05 |