Database Paper Browser

Back to papers

Record Linkage: Similarity Measures and Algorithms

Summary: Formalizes record linkage: defines problem flavors, surveys attribute-similarity predicates, and approaches to combine predicates for approximate joins. Delivers a cohesive expert survey of results, techniques, and tools for entity clustering and consistent partitioning, and outlines open research directions. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
3831
Venue
SIGMOD
Year
2006
Pagerank
0.00027518768
Overall Rank
322 | 97.77%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 46 of 46 citing papers.

Rank Citing Paper Year Venue Pagerank
192 HoloClean: Holistic Data Repairs with Probabilistic Inference 2017 VLDB 0.00035728858
319 Evaluation of entity resolution approaches on real-world match problems 2010 VLDB 0.00027781866
814 Entity Resolution: Theory, Practice & Open Challenges 2012 VLDB 0.00016370594
1,202 VGRAM: Improving Performance of Approximate Queries on String Collections Using Variable-Length Grams 2007 VLDB 0.00013326298
1,345 Entity Matching: How Similar Is Similar 2011 VLDB 0.00012468408
1,533 Example-driven Design of Efficient Record Matching Queries 2007 VLDB 0.00011471971
1,722 Building Structured Web Community Portals: A Top-Down, Compositional, and Incremental Approach 2007 VLDB 0.00010757784
2,073 Extending Autocompletion To Tolerate Errors 2009 SIGMOD 9.6142791e-05
2,193 Cost-Based Variable-Length-Gram Selection for String Collections to Support Approximate Queries Efficiently 2008 SIGMOD 9.3178557e-05
2,386 Leveraging Aggregate Constraints For Deduplication 2007 SIGMOD 8.9231895e-05
2,405 Linking Temporal Records 2011 VLDB 8.8815018e-05
3,105 Data X-Ray: A Diagnostic Tool for Data Errors 2015 SIGMOD 7.5568954e-05
3,130 Behavior Based Record Linkage 2010 VLDB 7.4993061e-05
3,140 ZeroER: Entity Resolution using Zero Labeled Examples 2020 SIGMOD 7.4841763e-05
3,230 Learning Semantic String Transformations from Examples 2012 VLDB 7.339123e-05
3,267 Benchmarking Declarative Approximate Selection Predicates 2007 SIGMOD 7.3058429e-05
3,451 Learning String Transformations From Examples 2009 VLDB 7.0822216e-05
3,631 On-the-Fly Entity-Aware Query Processing in the Presence of Linkage 2010 VLDB 6.9014378e-05
3,711 Saga: A Platform for Continuous Construction and Serving of Knowledge At Scale 2022 SIGMOD 6.823609e-05
4,137 Exploiting Content Redundancy for Web Information Extraction 2010 VLDB 6.4181549e-05
4,537 Privacy Preserving Schema and Data Matching 2007 SIGMOD 6.1042536e-05
4,951 Mining Document Collections to Facilitate Accurate Approximate Entity Matching 2009 VLDB 5.8100413e-05
4,988 Incremental Maintenance of Length Normalized Indexes for Approximate String Matching 2009 SIGMOD 5.783959e-05
5,094 Global Detection of Complex Copying Relationships Between Sources 2010 VLDB 5.7023083e-05
5,228 Schema-agnostic vs Schema-based Configurations for Blocking Methods on Homogeneous Data 2016 VLDB 5.6158315e-05
5,445 QFix: Diagnosing Errors through Query Histories 2017 SIGMOD 5.5020909e-05
5,536 On Indexing Error-Tolerant Set Containment 2010 SIGMOD 5.4532734e-05
5,652 From Information to Knowledge: Harvesting Entities and Relationships from Web Sources 2010 PODS 5.3903671e-05
5,887 Efficient Approximate Search on String Collections (Tutorial) 2009 VLDB 5.2879769e-05
6,754 Modeling Entity Evolution for Temporal Record Matching 2014 SIGMOD 4.9384574e-05
6,810 Record Linkage with Uniqueness Constraints and Erroneous Values 2010 VLDB 4.9203397e-05
6,818 NLyze: Interactive Programming by Natural Language for SpreadSheet Data Analysis and Manipulation 2014 SIGMOD 4.916347e-05
7,109 Efficient Similarity Join and Search on Multi-Attribute Data 2015 SIGMOD 4.8292998e-05
7,237 CleanM: An Optimizable Query Language for Unified Scale-Out Data Cleaning 2017 VLDB 4.7928651e-05
7,345 Linking Temporal Records for Profiling Entities 2015 SIGMOD 4.756212e-05
7,407 Intermittent Query Processing 2019 VLDB 4.7373205e-05
7,549 SOLOMON: Seeking the Truth Via Copying Detection 2010 VLDB 4.7137426e-05
7,669 Incorporating String Transformations in Record Matching 2008 SIGMOD 4.6833751e-05
8,007 A Grammar-based Entity Representation Framework for Data Cleaning 2009 SIGMOD 4.6068018e-05
8,137 Customizable and Scalable Fuzzy Join for Big Data 2019 VLDB 4.5774794e-05
8,932 Comparative evaluation of entity resolution approaches with FEVER 2009 VLDB 4.427232e-05
9,932 Local Filtering: Improving the Performance of Approximate Queries on String Collections 2015 SIGMOD 4.2500258e-05
10,358 Robust Statistical Analysis on Streaming Data with Near-Duplicates in General Metric Spaces 2025 PODS 4.1945683e-05
11,833 Streaming Algorithms for Robust Distinct Elements 2016 SIGMOD 4.1945683e-05
11,979 Similarity Joins for Uncertain Strings 2014 SIGMOD 4.1945683e-05
12,478 Randomized Algorithms for Data Reconciliation in Wide Area Aggregate Query Processing 2007 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 2 of 2 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
507 Data Quality and Data Cleaning: An Overview 2003 SIGMOD 0.00021473263
9,430 Approximate Joins: Concepts and Techniques 2005 VLDB 4.3441378e-05
Previous Page 1 / 1 Next

Semantically Similar Papers

Overall Rank Paper Year Venue Pagerank
8,549 LinkDB: A Probabilistic Linkage Database System 2011 SIGMOD 4.4937074e-05
936 Framework for Evaluating Clustering Algorithms in Duplicate Detection 2009 VLDB 0.0001521549
8,899 Fast Approximate Similarity Join in Vector Databases 2025 SIGMOD 4.427232e-05
3,529 Merging the Results of Approximate Match Operations 2004 VLDB 7.0059524e-05
1,345 Entity Matching: How Similar Is Similar 2011 VLDB 0.00012468408
1,533 Example-driven Design of Efficient Record Matching Queries 2007 VLDB 0.00011471971
3,130 Behavior Based Record Linkage 2010 VLDB 7.4993061e-05
4,383 Incremental Record Linkage 2014 VLDB 6.2383094e-05
6,810 Record Linkage with Uniqueness Constraints and Erroneous Values 2010 VLDB 4.9203397e-05
9,430 Approximate Joins: Concepts and Techniques 2005 VLDB 4.3441378e-05