The Merge/Purge Problem for Large Databases
Summary: Defines the merge/purge problem for large multi-source databases, identifying the same individuals across inconsistent records. Compares sorted-neighborhood blocking and clustering; a multi-pass transitive closure over alternate keys improves accuracy at the cost of efficiency. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
Incoming Citations (Sorted by Pagerank)
Showing 5 of 55 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 9,924 | On Saving Outliers for Better Clustering over Noisy Data | 2021 | SIGMOD | 4.2544238e-05 |
| 10,499 | Privacy and Accuracy-Aware AI/ML Model Deduplication | 2025 | SIGMOD | 4.1945683e-05 |
| 11,183 | Matching Roles from Temporal Data | 2023 | SIGMOD | 4.1945683e-05 |
| 12,425 | XClean in Action: A Demonstration of Declarative XML Data Cleaning | 2007 | CIDR | 4.1945683e-05 |
| 12,624 | Systematic Development of Data Mining-Based Data Quality Tools | 2003 | VLDB | 4.1945683e-05 |
Outgoing Citations (Sorted by Pagerank)
Showing 2 of 2 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 84 | AlphaSort: A RISC Machine Sort | 1994 | SIGMOD | 0.00053866006 |
| 152 | An Evaluation of Non-Equijoin Algorithms | 1991 | VLDB | 0.00040963225 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 786 | New Strategies for Computing the Transitive Closure of a Database Relation | 1987 | VLDB | 0.00016660109 |
| 4,435 | Sampling Dirty Data for Matching Attributes | 2010 | SIGMOD | 6.1918164e-05 |
| 6,686 | An Algorithm For Servicing Multi-Relational Queries | 1977 | SIGMOD | 4.9624102e-05 |
| 3,204 | Progressive Merge Join: A Generic and Non-Blocking Sort-Based Join Algorithm | 2002 | VLDB | 7.3889212e-05 |
| 1,635 | An In-depth Comparison of Subgraph Isomorphism Algorithms in Graph Databases | 2013 | VLDB | 0.0001105793 |
| 2,514 | Comparative Analysis of Approximate Blocking Techniques for Entity Resolution | 2016 | VLDB | 8.6139012e-05 |
| 15 | Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters | 2007 | SIGMOD | 0.0010654262 |
| 6,810 | Record Linkage with Uniqueness Constraints and Erroneous Values | 2010 | VLDB | 4.9203397e-05 |
| 7,824 | Optimization of Multiple-Relation Multiple-Disjunct Queries | 1988 | PODS | 4.6418459e-05 |
| 3,529 | Merging the Results of Approximate Match Operations | 2004 | VLDB | 7.0059524e-05 |