Database Paper Browser

Back to papers

The Merge/Purge Problem for Large Databases

Summary: Defines the merge/purge problem for large multi-source databases, identifying the same individuals across inconsistent records. Compares sorted-neighborhood blocking and clustering; a multi-pass transitive closure over alternate keys improves accuracy at the cost of efficiency. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
2794
Venue
SIGMOD
Year
1995
Pagerank
0.00061348205
Overall Rank
67 | 99.54%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 50 of 55 citing papers.

Rank Citing Paper Year Venue Pagerank
150 Integration of Heterogeneous Databases Without Common Domains Using Queries Based on Textual Similarity 1998 SIGMOD 0.00041055843
155 Robust and Efficient Fuzzy Match for Online Data Cleaning 2003 SIGMOD 0.00040637896
199 Declarative Data Cleaning: Language, Model, and Algorithms 2001 VLDB 0.00035041015
229 Reference Reconciliation in Complex Information Spaces 2005 SIGMOD 0.00032242633
266 Efficient Exact Set-Similarity Joins 2006 VLDB 0.00029718727
280 Eliminating Fuzzy Duplicates in Data Warehouses 2002 VLDB 0.00029113044
319 Evaluation of entity resolution approaches on real-world match problems 2010 VLDB 0.00027781866
509 On Active Learning of Record Matching Packages 2010 SIGMOD 0.00021409518
627 Management of Probabilistic Data: Foundations and Challenges 2007 PODS 0.00018959005
637 Automatic segmentation of text into structured records 2001 SIGMOD 0.00018824614
702 Reasoning about Record Matching Rules 2009 VLDB 0.00017918203
814 Entity Resolution: Theory, Practice & Open Challenges 2012 VLDB 0.00016370594
818 Finding Related Tables 2012 SIGMOD 0.00016311524
1,146 Estimating Alphanumeric Selectivity in the Presence of Wildcards 1996 SIGMOD 0.00013679782
1,242 Question Selection for Crowd Entity Resolution 2013 VLDB 0.00013096655
1,345 Entity Matching: How Similar Is Similar 2011 VLDB 0.00012468408
1,379 Substring Selectivity Estimation 1999 PODS 0.00012286879
1,410 Entity Resolution with Iterative Blocking 2009 SIGMOD 0.00012127555
1,533 Example-driven Design of Efficient Record Matching Queries 2007 VLDB 0.00011471971
1,908 Information-Theoretic Tools for Mining Database Structure from Large Data Sets 2004 SIGMOD 0.00010126101
1,970 Approximate Lineage for Probabilistic Databases 2008 VLDB 9.896375e-05
2,386 Leveraging Aggregate Constraints For Deduplication 2007 SIGMOD 8.9231895e-05
2,514 Comparative Analysis of Approximate Blocking Techniques for Entity Resolution 2016 VLDB 8.6139012e-05
2,589 DogmatiX Tracks down Duplicates in XML 2005 SIGMOD 8.4847146e-05
2,722 Progressive Approach to Relational Entity Resolution 2014 VLDB 8.2338356e-05
3,177 Evaluating Entity Resolution Results 2010 VLDB 7.4367331e-05
3,528 Distributed Data Deduplication 2016 VLDB 7.0066139e-05
3,529 Merging the Results of Approximate Match Operations 2004 VLDB 7.0059524e-05
3,532 Entity Resolution with Evolving Rules 2010 VLDB 7.0020216e-05
3,712 MOMA - A Mapping-based Object Matching System 2007 CIDR 6.823134e-05
4,438 Selectivity Estimation for Fuzzy String Predicates in Large Data Sets 2005 VLDB 6.1898903e-05
4,619 Crowd-Based Deduplication: An Adaptive Approach 2015 SIGMOD 6.0444854e-05
4,707 Object-level Vertical Search 2007 CIDR 5.9810753e-05
4,873 Power-Law Based Estimation of Set Similarity Join Size 2009 VLDB 5.8602304e-05
4,974 Supervised Meta-blocking 2014 VLDB 5.7903293e-05
4,989 BEER: Blocking for Effective Entity Resolution 2021 SIGMOD 5.7827362e-05
5,228 Schema-agnostic vs Schema-based Configurations for Blocking Methods on Homogeneous Data 2016 VLDB 5.6158315e-05
5,235 Industry-Scale Duplicate Detection 2008 VLDB 5.6115647e-05
5,282 Deep Indexed Active Learning for Matching Heterogeneous Entity Representations 2022 VLDB 5.5864206e-05
5,586 QuERy: A Framework for Integrating Entity Resolution with Query Processing 2016 VLDB 5.4219548e-05
5,778 Telcordia's Database Reconciliation and Data Quality Analysis Tool 2000 VLDB 5.3308297e-05
5,798 Exploiting Context Analysis for Combining Multiple Entity Resolution Systems 2009 SIGMOD 5.3231654e-05
6,175 Query-Driven Approach to Entity Resolution 2013 VLDB 5.169496e-05
7,061 Serving Deep Learning Models with Deduplication from Relational Databases 2022 VLDB 4.8463881e-05
7,185 Certus: An Effective Entity Resolution Approach with Graph Differential Dependencies (GDDs) 2019 VLDB 4.8066159e-05
7,725 Data Cleaning in Microsoft SQL Server 2005 2005 SIGMOD 4.6670883e-05
7,777 Indexing Mixed Types for Approximate Retrieval 2005 VLDB 4.653704e-05
7,867 Learning Over Dirty Data Without Cleaning 2020 SIGMOD 4.6320452e-05
8,632 Measuring the Structural Similarity of Semistructured Documents Using Entropy 2007 VLDB 4.4803734e-05
9,725 On Concise Set of Relative Candidate Keys 2014 VLDB 4.2945121e-05
Previous Page 1 / 2 Next

Outgoing Citations (Sorted by Pagerank)

Showing 2 of 2 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
84 AlphaSort: A RISC Machine Sort 1994 SIGMOD 0.00053866006
152 An Evaluation of Non-Equijoin Algorithms 1991 VLDB 0.00040963225
Previous Page 1 / 1 Next

Semantically Similar Papers