Snorkel: Fast Training Set Generation for Information Extraction
Summary: Snorkel enables training-set generation for information extraction via labeling functions encoding heuristics and weak supervision, with accuracy estimation. Noisy labels tolerated; model yields high accuracy on corporate-relations extraction from news. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
Incoming Citations (Sorted by Pagerank)
Showing 8 of 8 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 2,753 | Complaint-driven Training Data Debugging for Query 2.0 | 2020 | SIGMOD | 8.1724339e-05 |
| 5,645 | Warper: Efficiently Adapting Learned Cardinality Estimators to Data and Workload Drifts | 2022 | SIGMOD | 5.3923454e-05 |
| 6,955 | Inspector Gadget: A Data Programming-based Labeling System for Industrial Images | 2021 | VLDB | 4.8864297e-05 |
| 7,941 | Efficient Uncertainty Tracking for Complex Queries with Attribute-level Bounds | 2021 | SIGMOD | 4.613363e-05 |
| 8,055 | iFlipper: Label Flipping for Individual Fairness | 2023 | SIGMOD | 4.5947404e-05 |
| 9,252 | Improving Information Extraction from Visually Rich Documents using Visual Span Representations | 2021 | VLDB | 4.3690661e-05 |
| 10,465 | A Cost-Effective LLM-based Approach to Identify Wildlife Trafficking in Online Marketplaces | 2025 | SIGMOD | 4.1945683e-05 |
| 11,543 | Migrating a Privacy-Safe Information Extraction System to a Software 2.0 Design | 2020 | CIDR | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 0 of 0 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 3,640 | Deep Learning for Blocking in Entity Matching: A Design Space Exploration | 2021 | VLDB | 6.8891671e-05 |
| 5,347 | Adaptive Rule Discovery for Labeling Text Data | 2021 | SIGMOD | 5.5560452e-05 |
| 11,256 | Self-Training for Label-Efficient Information Extraction from Semi-Structured Web-Pages | 2023 | VLDB | 4.1945683e-05 |
| 13,132 | Accelerating Tabular Inference: Training Data Generation with TENET | 2025 | VLDB | - |
| 4,106 | Extracting Databases from Dark Data with DeepDive | 2016 | SIGMOD | 6.4456184e-05 |
| 3,635 | A Deep Dive into Deep Learning Approaches for Text-to-SQL Systems | 2021 | SIGMOD | 6.8981006e-05 |
| 9,409 | Ground Truth Inference for Weakly Supervised Entity Matching | 2023 | SIGMOD | 4.3441378e-05 |
| 5,251 | Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale | 2019 | SIGMOD | 5.6029615e-05 |
| 1,215 | Snuba: Automating Weak Supervision to Label Training Data | 2019 | VLDB | 0.0001323375 |
| 254 | Snorkel: Rapid Training Data Creation with Weak Supervision | 2018 | VLDB | 0.00030540555 |