| 300 |
Deep Learning for Entity Matching: A Design Space Exploration |
2018 |
SIGMOD |
0.00028441466 |
| 1,116 |
Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes |
2024 |
VLDB |
0.00013890154 |
| 1,215 |
Snuba: Automating Weak Supervision to Label Training Data |
2019 |
VLDB |
0.0001323375 |
| 1,337 |
HoloDetect: Few-Shot Learning for Error Detection |
2019 |
SIGMOD |
0.00012497164 |
| 1,666 |
HELIX: Holistic Optimization for Accelerating Iterative Machine Learning |
2019 |
VLDB |
0.0001096361 |
| 1,940 |
SliceLine: Fast, Linear-Algebra-based Slice Finding for ML Model Debugging |
2021 |
SIGMOD |
0.00010020173 |
| 1,993 |
Automatically Generating Data Exploration Sessions Using Deep Reinforcement Learning |
2020 |
SIGMOD |
9.8453334e-05 |
| 2,321 |
DBPal: A Fully Pluggable NL2SQL Training Pipeline |
2020 |
SIGMOD |
9.03609e-05 |
| 2,825 |
Smile: A System to Support Machine Learning on EEG Data at Scale |
2019 |
VLDB |
8.0563426e-05 |
| 2,839 |
VolcanoML: Speeding up End-to-End AutoML via Scalable Search Space Decomposition |
2021 |
VLDB |
8.0378978e-05 |
| 2,958 |
The Role of Massively Multi-Task and Weak Supervision in Software 2.0 |
2019 |
CIDR |
7.8173975e-05 |
| 3,303 |
Fonduer: Knowledge Base Construction from Richly Formatted Data |
2018 |
SIGMOD |
7.2487486e-05 |
| 3,508 |
spade: Synthesizing Data Quality Assertions for Large Language Model Pipelines |
2024 |
VLDB |
7.0271496e-05 |
| 3,942 |
Ember: No-Code Context Enrichment via Similarity-Based Keyless Joins |
2022 |
VLDB |
6.6114622e-05 |
| 4,196 |
Overton: A Data System for Monitoring and Improving Machine-Learned Products |
2020 |
CIDR |
6.3686231e-05 |
| 4,456 |
AutoOD: Automatic Outlier Detection |
2023 |
SIGMOD |
6.1704203e-05 |
| 4,471 |
GOGGLES: Automatic Image Labeling with Affinity Coding |
2020 |
SIGMOD |
6.1555681e-05 |
| 4,590 |
MB2: Decomposed Behavior Modeling for Self-Driving Database Management Systems |
2021 |
SIGMOD |
6.0620053e-05 |
| 4,607 |
Data Integration and Machine Learning: A Natural Synergy |
2018 |
SIGMOD |
6.0538827e-05 |
| 4,751 |
ODIN: Automated Drift Detection and Recovery in Video Analytics |
2020 |
VLDB |
5.9485403e-05 |
| 4,872 |
Explainable AI: Foundations, Applications, Opportunities for Data Management Research |
2022 |
SIGMOD |
5.8609352e-05 |
| 4,935 |
OmniFair: A Declarative System for Model-Agnostic Group Fairness in Machine Learning |
2021 |
SIGMOD |
5.8198727e-05 |
| 5,242 |
Towards Benchmarking Feature Type Inference for AutoML Platforms |
2021 |
SIGMOD |
5.6074743e-05 |
| 5,251 |
Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale |
2019 |
SIGMOD |
5.6029615e-05 |
| 5,347 |
Adaptive Rule Discovery for Labeling Text Data |
2021 |
SIGMOD |
5.5560452e-05 |
| 5,381 |
Selective Data Acquisition in the Wild for Model Charging |
2022 |
VLDB |
5.5399508e-05 |
| 5,412 |
Mining an "Anti-Knowledge Base" from Wikipedia Updates with Applications to Fact Checking and Beyond |
2020 |
VLDB |
5.5207515e-05 |
| 5,869 |
Demonstration of Panda: A Weakly Supervised Entity Matching System |
2021 |
VLDB |
5.2959029e-05 |
| 5,963 |
Automatic Data Acquisition for Deep Learning |
2021 |
VLDB |
5.2526794e-05 |
| 5,978 |
Rotom: A Meta-Learned Data Augmentation Framework for Entity Matching, Data Cleaning, Text Classification, and Beyond |
2021 |
SIGMOD |
5.2453012e-05 |
| 6,042 |
MDedup: Duplicate Detection with Matching Dependencies |
2020 |
VLDB |
5.2405269e-05 |
| 6,130 |
VOCAL: Video Organization and Interactive Compositional AnaLytics |
2022 |
CIDR |
5.1962107e-05 |
| 6,134 |
Finding Label and Model Errors in Perception Data With Learned Observation Assertions |
2022 |
SIGMOD |
5.1943414e-05 |
| 6,228 |
Managing ML Pipelines: Feature Stores and the Coming Wave of Embedding Ecosystems |
2021 |
VLDB |
5.1470042e-05 |
| 6,247 |
Optimizing In-memory Database Engine for AI-powered On-line Decision Augmentation Using Persistent Memory |
2021 |
VLDB |
5.1389201e-05 |
| 6,519 |
Expand your Training Limits! Generating Training Data for ML-based Data Management |
2021 |
SIGMOD |
5.0316686e-05 |
| 6,868 |
Cost-Effective Data Annotation using Game-Based Crowdsourcing |
2019 |
VLDB |
4.9010083e-05 |
| 7,243 |
Data Integration and Machine Learning: A Natural Synergy |
2018 |
VLDB |
4.7913666e-05 |
| 7,288 |
Witan: Unsupervised Labelling Function Generation for Assisted Data Programming |
2022 |
VLDB |
4.7762276e-05 |
| 7,643 |
Cross Modal Data Discovery over Structured and Unstructured Data Lakes |
2023 |
VLDB |
4.6901105e-05 |
| 7,656 |
Nautilus: An Optimized System for Deep Transfer Learning over Evolving Training Datasets |
2022 |
SIGMOD |
4.6871575e-05 |
| 7,796 |
CHEF: A Cheap and Fast Pipeline for Iteratively Cleaning Label Uncertainties |
2021 |
VLDB |
4.6482625e-05 |
| 8,055 |
iFlipper: Label Flipping for Individual Fairness |
2023 |
SIGMOD |
4.5947404e-05 |
| 8,182 |
SHiFT: An Efficient, Flexible Search Engine for Transfer Learning |
2023 |
VLDB |
4.5659133e-05 |
| 8,292 |
Nemo: Guiding and Contextualizing Weak Supervision for Interactive Data Programming |
2022 |
VLDB |
4.5435639e-05 |
| 8,343 |
CrowdGame: A Game-Based Crowdsourcing System for Cost-Effective Data Labeling |
2019 |
SIGMOD |
4.5429217e-05 |
| 8,514 |
UPLIFT: Parallelization Strategies for Feature Transformations in Machine Learning Workloads |
2022 |
VLDB |
4.4944285e-05 |
| 8,714 |
LANCET: Labeling Complex Data at Scale |
2021 |
VLDB |
4.4619818e-05 |
| 9,192 |
Hyper-Tune: Towards Efficient Hyper-parameter Tuning at Scale |
2022 |
VLDB |
4.3765131e-05 |
| 9,252 |
Improving Information Extraction from Visually Rich Documents using Visual Span Representations |
2021 |
VLDB |
4.3690661e-05 |