Database Paper Browser

Back to papers

Corleone: Hands-Off Crowdsourcing for Entity Matching

Summary: Corleone enables hands-off crowdsourcing for entity matching, crowdsourcing the entire EM workflow with no developers. Enabling enterprise mass crowdsourcing, it supports crowdsourced RDBMS joins, model cleaning, and complex information collection. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
4781
Venue
SIGMOD
Year
2014
Pagerank
0.00018754451
Overall Rank
643 | 95.53%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 50 of 58 citing papers.

Rank Citing Paper Year Venue Pagerank
221 Deep Entity Matching with Pre-Trained Language Models 2021 VLDB 0.00033121824
300 Deep Learning for Entity Matching: A Design Space Exploration 2018 SIGMOD 0.00028441466
517 Can Foundation Models Wrangle Your Data? 2023 VLDB 0.00021169035
712 Magellan: Toward Building Entity Matching Management Systems 2016 VLDB 0.00017732426
754 Distributed Representations of Tuples for Entity Resolution 2018 VLDB 0.00017117211
791 ActiveClean: Interactive Data Cleaning For Statistical Modeling 2016 VLDB 0.00016629664
1,627 Data Cleaning: Overview and Emerging Challenges 2016 SIGMOD 0.00011086905
1,716 Chimera: Large-Scale Classification using Machine Learning, Rules, and Crowdsourcing 2014 VLDB 0.00010795718
1,831 Synthesizing Entity Matching Rules by Examples 2018 VLDB 0.00010384082
1,914 Creating Embeddings of Heterogeneous Relational Datasets for Data Integration Tasks 2020 SIGMOD 0.00010109102
2,175 Falcon: Scaling Up Hands-Off Crowdsourced Entity Matching to Build Cloud Services 2017 SIGMOD 9.3644117e-05
2,767 A Comprehensive Benchmark Framework for Active Learning Methods in Entity Matching 2020 SIGMOD 8.1513883e-05
3,640 Deep Learning for Blocking in Entity Matching: A Design Space Exploration 2021 VLDB 6.8891671e-05
4,104 Online Entity Resolution Using an Oracle 2016 VLDB 6.4493809e-05
4,126 Waldo: An Adaptive Human Interface for Crowd Entity Resolution 2017 SIGMOD 6.4314729e-05
4,278 Similarity Query Processing for High-Dimensional Data 2020 VLDB 6.2953764e-05
4,402 Smurf: Self-Service String Matching Using Random Forests 2019 VLDB 6.2195162e-05
4,451 CLAMShell: Speeding up Crowds for Low-latency Data Labeling 2016 VLDB 6.1738675e-05
4,607 Data Integration and Machine Learning: A Natural Synergy 2018 SIGMOD 6.0538827e-05
4,619 Crowd-Based Deduplication: An Adaptive Approach 2015 SIGMOD 6.0444854e-05
4,665 Argonaut: Macrotask Crowdsourcing for Complex Data Processing 2015 VLDB 6.0125329e-05
4,837 Entity Resolution with Hierarchical Graph Attention Networks 2022 SIGMOD 5.8892326e-05
4,989 BEER: Blocking for Effective Entity Resolution 2021 SIGMOD 5.7827362e-05
5,362 Cost-Effective Crowdsourced Entity Resolution: A Partial-Order Approach 2016 SIGMOD 5.5473503e-05
5,445 QFix: Diagnosing Errors through Query Histories 2017 SIGMOD 5.5020909e-05
5,622 Monotonic Cardinality Estimation of Similarity Selection: A Deep Learning Approach 2020 SIGMOD 5.4060403e-05
5,929 ActiveClean: An Interactive Data Cleaning Framework For Modern Machine Learning 2016 SIGMOD 5.2682177e-05
6,111 Why Big Data Industrial Systems Need Rules and What We Can Do About It 2015 SIGMOD 5.2049579e-05
6,868 Cost-Effective Data Annotation using Game-Based Crowdsourcing 2019 VLDB 4.9010083e-05
6,955 Inspector Gadget: A Data Programming-based Labeling System for Industrial Images 2021 VLDB 4.8864297e-05
7,013 Qualitative Data Cleaning 2016 VLDB 4.8619024e-05
7,117 Crowdsourced Data Management: Overview and Challenges 2017 SIGMOD 4.826509e-05
7,185 Certus: An Effective Entity Resolution Approach with Graph Differential Dependencies (GDDs) 2019 VLDB 4.8066159e-05
7,243 Data Integration and Machine Learning: A Natural Synergy 2018 VLDB 4.7913666e-05
7,575 Human-in-the-loop Outlier Detection 2020 SIGMOD 4.7068909e-05
7,668 Human-in-the-loop Data Integration 2017 VLDB 4.6834075e-05
7,780 A Natural Language Interface for Querying General and Individual Knowledge 2015 VLDB 4.6533677e-05
8,099 Sparkly: A Simple yet Surprisingly Strong TF/IDF Blocker for Entity Matching 2023 VLDB 4.5859317e-05
8,362 Minimizing Efforts in Validating Crowd Answers 2015 SIGMOD 4.5366717e-05
8,585 Robust Entity Resolution using Random Graphs 2018 SIGMOD 4.4905755e-05
8,593 Wisteria: Nurturing Scalable Data Cleaning Infrastructure 2015 VLDB 4.4891474e-05
8,694 Managing General and Individual Knowledge in Crowd Mining Applications 2015 CIDR 4.4661379e-05
8,908 Deep Active Alignment of Knowledge Graph Entities and Schemata 2023 SIGMOD 4.427232e-05
8,911 PromptEM: Prompt-tuning for Low-resource Generalized Entity Matching 2023 VLDB 4.427232e-05
9,056 A Data Quality Metric (DQM): How to Estimate the Number of Undetected Errors in Data Sets 2017 VLDB 4.4039656e-05
9,683 Hierarchical Entity Resolution using an Oracle 2022 SIGMOD 4.3047774e-05
9,684 How to Design Robust Algorithms using Noisy Comparison Oracle 2021 VLDB 4.3047774e-05
9,771 EasyDR: A Human-in-the-loop Error Detection and Repair Platform for Holistic Table Cleaning 2022 VLDB 4.2856106e-05
9,896 Towards Interpretable and Learnable Risk Analysis for Entity Resolution 2020 SIGMOD 4.2600049e-05
10,022 In-context Clustering-based Entity Resolution with Large Language Models: A Design Space Exploration 2026 SIGMOD 4.1945683e-05
Previous Page 1 / 2 Next

Outgoing Citations (Sorted by Pagerank)

Showing 13 of 13 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
94 CrowdDB: Answering Queries with Crowdsourcing 2011 SIGMOD 0.00051013264
249 Crowdsourced Databases: Query Processing with People 2011 CIDR 0.00030740523
263 CrowdER: Crowdsourcing Entity Resolution 2012 VLDB 0.00029862413
267 Human-powered Sorts and Joins 2012 VLDB 0.00029690405
319 Evaluation of entity resolution approaches on real-world match problems 2010 VLDB 0.00027781866
489 Data Curation at Scale: The Data Tamer System 2013 CIDR 0.00022030728
509 On Active Learning of Record Matching Packages 2010 SIGMOD 0.00021409518
859 So Who Won? Dynamic Max Discovery with the Crowd 2012 SIGMOD 0.00015870894
866 Leveraging Transitive Relations for Crowdsourced Joins 2013 SIGMOD 0.00015801196
1,242 Question Selection for Crowd Entity Resolution 2013 VLDB 0.00013096655
1,491 CDAS: A Crowdsourcing Data Analytics System 2012 VLDB 0.00011694982
2,809 Deco: A System for Declarative Crowdsourcing 2012 VLDB 8.0869896e-05
3,100 Crowd Mining 2013 SIGMOD 7.5634778e-05
Previous Page 1 / 1 Next

Semantically Similar Papers