Snorkel: Rapid Training Data Creation with Weak Supervision
Summary: Snorkel enables rapid ML training from weak supervision via labeling functions with unknown accuracies. End-to-end data programming denoises labels without ground truth, with a tradeoff optimizer, showing speedups and accuracy gains over hand labeling. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Alexander Ratner
- 2. Stephen H. Bach
- 3. Henry Ehrenberg
- 4. Jason Fries
- 5. Sen Wu
- 6. Christopher RĂ©
Incoming Citations (Sorted by Pagerank)
Showing 20 of 70 citing papers.
Outgoing Citations (Sorted by Pagerank)
Showing 5 of 5 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 192 | HoloClean: Holistic Data Repairs with Probabilistic Inference | 2017 | VLDB | 0.00035728858 |
| 371 | A Bayesian Approach to Discovering Truth from Conflicting Sources for Data Integration | 2012 | VLDB | 0.00025389696 |
| 398 | Big Data Integration | 2013 | VLDB | 0.00024372588 |
| 908 | Fusing Data with Correlations | 2014 | SIGMOD | 0.00015431241 |
| 3,897 | SLiMFast: Guaranteed Results for Data Fusion and Source Reliability | 2017 | SIGMOD | 6.6554845e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 8,590 | Exploratory Training: When Annotators Learn About Data | 2023 | SIGMOD | 4.4896282e-05 |
| 9,409 | Ground Truth Inference for Weakly Supervised Entity Matching | 2023 | SIGMOD | 4.3441378e-05 |
| 5,963 | Automatic Data Acquisition for Deep Learning | 2021 | VLDB | 5.2526794e-05 |
| 5,347 | Adaptive Rule Discovery for Labeling Text Data | 2021 | SIGMOD | 5.5560452e-05 |
| 8,292 | Nemo: Guiding and Contextualizing Weak Supervision for Interactive Data Programming | 2022 | VLDB | 4.5435639e-05 |
| 6,955 | Inspector Gadget: A Data Programming-based Labeling System for Industrial Images | 2021 | VLDB | 4.8864297e-05 |
| 2,958 | The Role of Massively Multi-Task and Weak Supervision in Software 2.0 | 2019 | CIDR | 7.8173975e-05 |
| 5,251 | Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale | 2019 | SIGMOD | 5.6029615e-05 |
| 1,215 | Snuba: Automating Weak Supervision to Label Training Data | 2019 | VLDB | 0.0001323375 |
| 4,087 | Snorkel: Fast Training Set Generation for Information Extraction | 2017 | SIGMOD | 6.4607746e-05 |