Database Paper Browser

Back to papers

Scaling Up Crowd-Sourcing to Very Large Datasets: A Case for Active Learning

Summary: Scalable active-learning for crowd-sourced databases, combining ML with human labeling via nonparametric bootstrap. MTurk and 15 datasets show 1–2 orders of magnitude fewer questions than baselines and 4.5–44× faster than prior AL. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
11003
Venue
VLDB
Year
2015
Pagerank
7.5379338e-05
Overall Rank
3,118 | 78.32%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 17 of 17 citing papers.

Rank Citing Paper Year Venue Pagerank
791 ActiveClean: Interactive Data Cleaning For Statistical Modeling 2016 VLDB 0.00016629664
1,204 VerdictDB: Universalizing Approximate Query Processing 2018 SIGMOD 0.00013319541
1,627 Data Cleaning: Overview and Emerging Challenges 2016 SIGMOD 0.00011086905
2,175 Falcon: Scaling Up Hands-Off Crowdsourced Entity Matching to Build Cloud Services 2017 SIGMOD 9.3644117e-05
2,767 A Comprehensive Benchmark Framework for Active Learning Methods in Entity Matching 2020 SIGMOD 8.1513883e-05
3,142 Active Learning for ML Enhanced Database Systems 2020 SIGMOD 7.4815444e-05
3,773 Cleaning Crowdsourced Labels Using Oracles for Statistical Classification 2019 VLDB 6.7758649e-05
4,451 CLAMShell: Speeding up Crowds for Low-latency Data Labeling 2016 VLDB 6.1738675e-05
5,282 Deep Indexed Active Learning for Matching Heterogeneous Entity Representations 2022 VLDB 5.5864206e-05
5,896 In Search of an Entity Resolution OASIS: Optimal Asymptotic Sequential Importance Sampling 2017 VLDB 5.2847867e-05
7,117 Crowdsourced Data Management: Overview and Challenges 2017 SIGMOD 4.826509e-05
7,648 User Guidance for Efficient Fact Checking 2019 VLDB 4.6889787e-05
9,460 The Battleship Approach to the Low Resource Entity Matching Problem 2023 SIGMOD 4.3366491e-05
9,896 Towards Interpretable and Learnable Risk Analysis for Entity Resolution 2020 SIGMOD 4.2600049e-05
11,230 VersaMatch: Ontology Matching with Weak Supervision 2023 VLDB 4.1945683e-05
11,593 Recommending Deployment Strategies for Collaborative Tasks 2020 SIGMOD 4.1945683e-05
11,770 Staging User Feedback toward Rapid Conflict Resolution in Data Fusion 2017 SIGMOD 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 8 of 8 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
94 CrowdDB: Answering Queries with Crowdsourcing 2011 SIGMOD 0.00051013264
263 CrowdER: Crowdsourcing Entity Resolution 2012 VLDB 0.00029862413
267 Human-powered Sorts and Joins 2012 VLDB 0.00029690405
509 On Active Learning of Record Matching Packages 2010 SIGMOD 0.00021409518
1,164 CrowdScreen: Algorithms for Filtering Data with Humans 2012 SIGMOD 0.00013564823
2,334 Counting with the Crowd 2013 VLDB 9.0161817e-05
2,365 The Analytical Bootstrap: a New Method for Fast Error Estimation in Approximate Query Processing 2014 SIGMOD 8.9551432e-05
5,868 ABS: a System for Scalable Approximate Queries with Accuracy Guarantees 2014 SIGMOD 5.2959352e-05
Previous Page 1 / 1 Next

Semantically Similar Papers