Database Paper Browser

Back to papers

Human-in-the-loop Data Integration

Summary: Hybrid human–machine data integration for entity matching uses learned rules and DIMA to propose candidate matches. A crowd-driven selection-inference-refine workflow verifies candidates with transitivity-based inference via a SQL-like CDB on platforms. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
11509
Venue
VLDB
Year
2017
Pagerank
4.6834075e-05
Overall Rank
7,668 | 46.66%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 9 of 9 citing papers.

Rank Citing Paper Year Venue Pagerank
782 QTune: A Query-Aware Database Tuning System with Deep Reinforcement Learning 2019 VLDB 0.00016729063
2,730 Open Data Integration 2018 VLDB 8.2126735e-05
3,473 AI Meets Database: AI4DB and DB4AI 2021 SIGMOD 7.062864e-05
6,868 Cost-Effective Data Annotation using Game-Based Crowdsourcing 2019 VLDB 4.9010083e-05
8,008 Entity Resolution On-Demand 2022 VLDB 4.6067684e-05
9,896 Towards Interpretable and Learnable Risk Analysis for Entity Resolution 2020 SIGMOD 4.2600049e-05
10,216 The Case For Language Model Approximated LIKE Predicate 2026 SIGMOD 4.1945683e-05
10,617 Deduplicated Sampling On-Demand 2025 VLDB 4.1945683e-05
11,707 A Rating-Ranking Method for Crowdsourced Top-k Computation 2018 SIGMOD 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 43 of 43 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
94 CrowdDB: Answering Queries with Crowdsourcing 2011 SIGMOD 0.00051013264
249 Crowdsourced Databases: Query Processing with People 2011 CIDR 0.00030740523
250 Efficient set joins on similarity predicates 2004 SIGMOD 0.00030661988
263 CrowdER: Crowdsourcing Entity Resolution 2012 VLDB 0.00029862413
266 Efficient Exact Set-Similarity Joins 2006 VLDB 0.00029718727
267 Human-powered Sorts and Joins 2012 VLDB 0.00029690405
447 Efficient Parallel Set-Similarity Joins Using MapReduce 2010 SIGMOD 0.00022900171
643 Corleone: Hands-Off Crowdsourcing for Entity Matching 2014 SIGMOD 0.00018754451
712 Magellan: Toward Building Entity Matching Management Systems 2016 VLDB 0.00017732426
859 So Who Won? Dynamic Max Discovery with the Crowd 2012 SIGMOD 0.00015870894
866 Leveraging Transitive Relations for Crowdsourced Joins 2013 SIGMOD 0.00015801196
1,164 CrowdScreen: Algorithms for Filtering Data with Humans 2012 SIGMOD 0.00013564823
1,234 Ed-Join: An Efficient Algorithm for Similarity Joins With Edit Distance Constraints 2008 VLDB 0.00013122499
1,242 Question Selection for Crowd Entity Resolution 2013 VLDB 0.00013096655
1,305 Bayesian Locality Sensitive Hashing for Fast Similarity Search 2012 VLDB 0.00012687101
1,345 Entity Matching: How Similar Is Similar 2011 VLDB 0.00012468408
1,396 Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search 2012 SIGMOD 0.00012204748
1,410 Entity Resolution with Iterative Blocking 2009 SIGMOD 0.00012127555
1,715 V-SMART-Join: A Scalable MapReduce Framework for All-Pair Similarity Joins of Multisets and Vectors 2012 VLDB 0.00010803271
2,024 ATLAS: A Probabilistic Algorithm for High Dimensional Similarity Search 2011 SIGMOD 9.7519678e-05
2,175 Falcon: Scaling Up Hands-Off Crowdsourced Entity Matching to Build Cloud Services 2017 SIGMOD 9.3644117e-05
2,334 Counting with the Crowd 2013 VLDB 9.0161817e-05
2,567 Resolving Conflicts in Heterogeneous Data by Truth Discovery and Source Reliability Estimation 2014 SIGMOD 8.5239306e-05
2,592 Pass-Join: A Partition-based Method for Similarity Joins 2012 VLDB 8.4795761e-05
2,740 String Similarity Joins: An Experimental Evaluation 2014 VLDB 8.1980628e-05
2,809 Deco: A System for Declarative Crowdsourcing 2012 VLDB 8.0869896e-05
2,937 Truth Inference in Crowdsourcing: Is the Problem Solved? 2017 VLDB 7.853108e-05
3,067 CrowdFill: Collecting Structured Data from the Crowd 2014 SIGMOD 7.6180371e-05
3,263 QASCA: A Quality-Aware Task Assignment System for Crowdsourcing Applications 2015 SIGMOD 7.3097573e-05
3,322 iCrowd: An Adaptive Crowdsourcing Framework 2015 SIGMOD 7.2230626e-05
3,645 Large-Scale Collective Entity Matching 2011 VLDB 6.8853274e-05
3,977 BLAST: a Loosely Schema-aware Meta-blocking Approach for Entity Resolution 2016 VLDB 6.5736268e-05
4,011 A Confidence-Aware Approach for Truth Discovery on Long-Tail Data 2015 VLDB 6.5343479e-05
4,050 An Efficient Partition Based Method for Exact Set Similarity Joins 2016 VLDB 6.4953612e-05
4,619 Crowd-Based Deduplication: An Adaptive Approach 2015 SIGMOD 6.0444854e-05
5,081 Reducing Uncertainty of Schema Matching via Crowdsourcing 2013 VLDB 5.7132042e-05
5,362 Cost-Effective Crowdsourced Entity Resolution: A Partial-Order Approach 2016 SIGMOD 5.5473503e-05
6,290 Putting Context into Schema Matching 2006 VLDB 5.1271647e-05
6,605 Dima: A Distributed In-Memory Similarity-Based Query Processing System 2017 VLDB 4.9965703e-05
7,109 Efficient Similarity Join and Search on Multi-Attribute Data 2015 SIGMOD 4.8292998e-05
7,588 Scalable Column Concept Determination for Web Tables Using Large Knowledge Bases 2013 VLDB 4.7030914e-05
9,567 META: An Efficient Matching-Based Method for Error-Tolerant Autocompletion 2016 VLDB 4.3254416e-05
11,788 CDB: Optimizing Queries with Crowd-Based Selections and Joins 2017 SIGMOD 4.1945683e-05
Previous Page 1 / 1 Next

Semantically Similar Papers