Scaling Up Crowd-Sourcing to Very Large Datasets: A Case for Active Learning
Summary: Scalable active-learning for crowd-sourced databases, combining ML with human labeling via nonparametric bootstrap. MTurk and 15 datasets show 1–2 orders of magnitude fewer questions than baselines and 4.5–44× faster than prior AL. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Barzan Mozafari
- 2. Purna Sarkar
- 3. Michael Franklin
- 4. Michael Jordan
- 5. Samuel Madden
Incoming Citations (Sorted by Pagerank)
Showing 17 of 17 citing papers.
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 8 of 8 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 94 | CrowdDB: Answering Queries with Crowdsourcing | 2011 | SIGMOD | 0.00051013264 |
| 263 | CrowdER: Crowdsourcing Entity Resolution | 2012 | VLDB | 0.00029862413 |
| 267 | Human-powered Sorts and Joins | 2012 | VLDB | 0.00029690405 |
| 509 | On Active Learning of Record Matching Packages | 2010 | SIGMOD | 0.00021409518 |
| 1,164 | CrowdScreen: Algorithms for Filtering Data with Humans | 2012 | SIGMOD | 0.00013564823 |
| 2,334 | Counting with the Crowd | 2013 | VLDB | 9.0161817e-05 |
| 2,365 | The Analytical Bootstrap: a New Method for Fast Error Estimation in Approximate Query Processing | 2014 | SIGMOD | 8.9551432e-05 |
| 5,868 | ABS: a System for Scalable Approximate Queries with Accuracy Guarantees | 2014 | SIGMOD | 5.2959352e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 8,362 | Minimizing Efforts in Validating Crowd Answers | 2015 | SIGMOD | 4.5366717e-05 |
| 1,841 | Crowdsourcing Algorithms for Entity Resolution | 2014 | VLDB | 0.00010348858 |
| 7,117 | Crowdsourced Data Management: Overview and Challenges | 2017 | SIGMOD | 4.826509e-05 |
| 4,579 | Crowdsourced Top-k Algorithms: An Experimental Evaluation | 2016 | VLDB | 6.070469e-05 |
| 6,868 | Cost-Effective Data Annotation using Game-Based Crowdsourcing | 2019 | VLDB | 4.9010083e-05 |
| 1,491 | CDAS: A Crowdsourcing Data Analytics System | 2012 | VLDB | 0.00011694982 |
| 8,343 | CrowdGame: A Game-Based Crowdsourcing System for Cost-Effective Data Labeling | 2019 | SIGMOD | 4.5429217e-05 |
| 1,242 | Question Selection for Crowd Entity Resolution | 2013 | VLDB | 0.00013096655 |
| 5,734 | Efficient Algorithms for Crowd-Aided Categorization | 2020 | VLDB | 5.3482904e-05 |
| 4,827 | An Online Cost Sensitive Decision-Making Method in Crowdsourcing Systems | 2013 | SIGMOD | 5.8938399e-05 |