Database Paper Browser

Back to papers

Snorkel: Rapid Training Data Creation with Weak Supervision

Summary: Snorkel enables rapid ML training from weak supervision via labeling functions with unknown accuracies. End-to-end data programming denoises labels without ground truth, with a tradeoff optimizer, showing speedups and accuracy gains over hand labeling. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
11741
Venue
VLDB
Year
2018
Pagerank
0.00030540555
Overall Rank
254 | 98.24%
DOI
10.14778/3157794.3157797

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 20 of 70 citing papers.

Rank Citing Paper Year Venue Pagerank
9,253 Glean: Structured Extractions from Templatic Documents 2021 VLDB 4.3690661e-05
9,365 Falcon: Fair Active Learning using Multi-armed Bandits 2024 VLDB 4.3502315e-05
9,409 Ground Truth Inference for Weakly Supervised Entity Matching 2023 SIGMOD 4.3441378e-05
9,438 Bootleg: Chasing the Tail with Self-Supervised Named Entity Disambiguation 2021 CIDR 4.3425082e-05
9,777 Data Augmentation for ML-driven Data Preparation and Integration 2021 VLDB 4.2856106e-05
9,806 The Image Calculator: 10x Faster Image-AI Inference by Replacing JPEG with Self-designing Storage Format 2024 SIGMOD 4.2805224e-05
9,830 Towards Autonomous, Hands-Free Data Exploration 2020 CIDR 4.2751057e-05
10,289 LEAD: Iterative Data Selection for Efficient LLM Instruction Tuning 2026 VLDB 4.1945683e-05
10,291 Morphing-based Compression for Data-centric ML Pipelines 2026 VLDB 4.1945683e-05
10,533 WeShap: Weak Supervision Source Evaluation with Shapley Values 2025 VLDB 4.1945683e-05
10,560 A Systematic Study on Early Stopping Metrics in HPO and the Implications of Uncertainty 2025 VLDB 4.1945683e-05
11,137 Generalizable Data Cleaning of Tabular Data in Latent Space 2024 VLDB 4.1945683e-05
11,205 Steered Training Data Generation for Learned Semantic Type Detection 2023 SIGMOD 4.1945683e-05
11,230 VersaMatch: Ontology Matching with Weak Supervision 2023 VLDB 4.1945683e-05
11,409 Machine Programming: Turning Data into Programmer Productivity 2022 VLDB 4.1945683e-05
11,431 Ease.ML: A Lifecycle Management System for MLDev and MLOps 2021 CIDR 4.1945683e-05
11,524 An Extensible and Reusable Pipeline for Automated Utterance Paraphrases 2021 VLDB 4.1945683e-05
11,538 Quality of Sentiment Analysis Tools: The Reasons of Inconsistency 2021 VLDB 4.1945683e-05
11,560 Factorized Graph Representations for Semi-Supervised Learning from Sparse Data 2020 SIGMOD 4.1945683e-05
11,629 Leveraging Organizational Resources to Adapt Models to New Data Modalities 2020 VLDB 4.1945683e-05
Previous Page 2 / 2 Next

Outgoing Citations (Sorted by Pagerank)

Showing 5 of 5 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
192 HoloClean: Holistic Data Repairs with Probabilistic Inference 2017 VLDB 0.00035728858
371 A Bayesian Approach to Discovering Truth from Conflicting Sources for Data Integration 2012 VLDB 0.00025389696
398 Big Data Integration 2013 VLDB 0.00024372588
908 Fusing Data with Correlations 2014 SIGMOD 0.00015431241
3,897 SLiMFast: Guaranteed Results for Data Fusion and Source Reliability 2017 SIGMOD 6.6554845e-05
Previous Page 1 / 1 Next

Semantically Similar Papers