Database Paper Browser

Back to papers

Crowd-Based Deduplication: An Adaptive Approach

Summary: ACD adapts correlation clustering to crowd-based deduplication, with techniques to speed crowd work and postprocess for higher accuracy. MTurk experiments show higher precision than state-of-the-art with moderate crowdsourcing overhead. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
4959
Venue
SIGMOD
Year
2015
Pagerank
6.0444854e-05
Overall Rank
4,619 | 67.87%
DOI
10.1145/2723372.2723739

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 11 of 11 citing papers.

Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 19 of 19 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
67 The Merge/Purge Problem for Large Databases 1995 SIGMOD 0.00061348205
94 CrowdDB: Answering Queries with Crowdsourcing 2011 SIGMOD 0.00051013264
150 Integration of Heterogeneous Databases Without Common Domains Using Queries Based on Textual Similarity 1998 SIGMOD 0.00041055843
263 CrowdER: Crowdsourcing Entity Resolution 2012 VLDB 0.00029862413
267 Human-powered Sorts and Joins 2012 VLDB 0.00029690405
509 On Active Learning of Record Matching Packages 2010 SIGMOD 0.00021409518
643 Corleone: Hands-Off Crowdsourcing for Entity Matching 2014 SIGMOD 0.00018754451
692 Pay-as-you-go User Feedback for Dataspace Systems 2008 SIGMOD 0.00018083948
859 So Who Won? Dynamic Max Discovery with the Crowd 2012 SIGMOD 0.00015870894
866 Leveraging Transitive Relations for Crowdsourced Joins 2013 SIGMOD 0.00015801196
936 Framework for Evaluating Clustering Algorithms in Duplicate Detection 2009 VLDB 0.0001521549
1,164 CrowdScreen: Algorithms for Filtering Data with Humans 2012 SIGMOD 0.00013564823
1,242 Question Selection for Crowd Entity Resolution 2013 VLDB 0.00013096655
1,491 CDAS: A Crowdsourcing Data Analytics System 2012 VLDB 0.00011694982
1,841 Crowdsourcing Algorithms for Entity Resolution 2014 VLDB 0.00010348858
1,885 CrowdDB: Query Processing with the VLDB Crowd 2011 VLDB 0.0001021098
3,100 Crowd Mining 2013 SIGMOD 7.5634778e-05
4,185 Arnold: Declarative Crowd-Machine Data Integration 2013 CIDR 6.3776356e-05
5,798 Exploiting Context Analysis for Combining Multiple Entity Resolution Systems 2009 SIGMOD 5.3231654e-05
Previous Page 1 / 1 Next

Semantically Similar Papers

Overall Rank Paper Year Venue Pagerank
11,788 CDB: Optimizing Queries with Crowd-Based Selections and Joins 2017 SIGMOD 4.1945683e-05
4,416 CrowdMatcher: Crowd-Assisted Schema Matching 2014 SIGMOD 6.2039225e-05
263 CrowdER: Crowdsourcing Entity Resolution 2012 VLDB 0.00029862413
280 Eliminating Fuzzy Duplicates in Data Warehouses 2002 VLDB 0.00029113044
6,042 MDedup: Duplicate Detection with Matching Dependencies 2020 VLDB 5.2405269e-05
908 Fusing Data with Correlations 2014 SIGMOD 0.00015431241
3,360 Modeling and Querying Possible Repairs in Duplicate Detection 2009 VLDB 7.1742067e-05
936 Framework for Evaluating Clustering Algorithms in Duplicate Detection 2009 VLDB 0.0001521549
3,528 Distributed Data Deduplication 2016 VLDB 7.0066139e-05
2,386 Leveraging Aggregate Constraints For Deduplication 2007 SIGMOD 8.9231895e-05