The Role of Massively Multi-Task and Weak Supervision in Software 2.0

Summary: Vision: program Software 2.0 by labeling—declarative weak supervision aggregated via unsupervised label models to cheaply generate training data. Introduce massively multitask central models to amortize labeling across many tasks and validate via Snorkel deployments (ad fraud, diagnostics). (summarized by gpt-5-mini on Feb 09 2026)

Paper ID: 336
Venue: CIDR
Year: 2019
Pagerank: 7.8103118e-05
Overall Rank: 2,960 | 79.43%
DOI: -

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 5 of 5 citing papers.

Rank	Citing Paper	Year	Venue	Pagerank
3,212	Panorama: A Data System for Unbounded Vocabulary Querying over Video	2020	VLDB	7.3772955e-05
4,197	Overton: A Data System for Monitoring and Improving Machine-Learned Products	2020	CIDR	6.3625568e-05
4,746	Slice Tuner: A Selective Data Acquisition Framework for Accurate and Fair Machine Learning Models	2021	SIGMOD	5.9446518e-05
9,439	Rock: Cleaning Data by Embedding ML in Logic Rules	2024	SIGMOD	4.3389137e-05
11,547	Migrating a Privacy-Safe Information Extraction System to a Software 2.0 Design	2020	CIDR	4.1905499e-05

Outgoing Citations (Sorted by Pagerank)

Showing 4 of 4 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank	Cited Paper	Year	Venue	Pagerank
192	HoloClean: Holistic Data Repairs with Probabilistic Inference	2017	VLDB	0.00035692958
252	Snorkel: Rapid Training Data Creation with Weak Supervision	2018	VLDB	0.00030532082
1,218	Snuba: Automating Weak Supervision to Label Training Data	2019	VLDB	0.00013221309
5,257	Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale	2019	SIGMOD	5.5975788e-05

Semantically Similar Papers

Overall Rank	Paper	Year	Venue	Pagerank
8,285	Nemo: Guiding and Contextualizing Weak Supervision for Interactive Data Programming	2022	VLDB	4.5392079e-05
4,088	Snorkel: Fast Training Set Generation for Information Extraction	2017	SIGMOD	6.457048e-05
11,633	Leveraging Organizational Resources to Adapt Models to New Data Modalities	2020	VLDB	4.1905499e-05
6,953	Inspector Gadget: A Data Programming-based Labeling System for Industrial Images	2021	VLDB	4.8817419e-05
7,138	Ease.ml/ci and Ease.ml/meter in Action: Towards Data Management for Statistical Generalization	2019	VLDB	4.8164681e-05
6,526	Data Collection and Quality Challenges for Deep Learning	2020	VLDB	5.0219175e-05
9,116	Towards Observability for Production Machine Learning Pipelines	2022	VLDB	4.3886184e-05
11,547	Migrating a Privacy-Safe Information Extraction System to a Software 2.0 Design	2020	CIDR	4.1905499e-05
252	Snorkel: Rapid Training Data Creation with Weak Supervision	2018	VLDB	0.00030532082
5,257	Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale	2019	SIGMOD	5.5975788e-05