Database Paper Browser

Back to papers

The Merge/Purge Problem for Large Databases

Summary: Defines the merge/purge problem for large multi-source databases, identifying the same individuals across inconsistent records. Compares sorted-neighborhood blocking and clustering; a multi-pass transitive closure over alternate keys improves accuracy at the cost of efficiency. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
2794
Venue
SIGMOD
Year
1995
Pagerank
0.00061348205
Overall Rank
67 | 99.54%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 5 of 55 citing papers.

Rank Citing Paper Year Venue Pagerank
9,924 On Saving Outliers for Better Clustering over Noisy Data 2021 SIGMOD 4.2544238e-05
10,499 Privacy and Accuracy-Aware AI/ML Model Deduplication 2025 SIGMOD 4.1945683e-05
11,183 Matching Roles from Temporal Data 2023 SIGMOD 4.1945683e-05
12,425 XClean in Action: A Demonstration of Declarative XML Data Cleaning 2007 CIDR 4.1945683e-05
12,624 Systematic Development of Data Mining-Based Data Quality Tools 2003 VLDB 4.1945683e-05
Previous Page 2 / 2 Next

Outgoing Citations (Sorted by Pagerank)

Showing 2 of 2 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
84 AlphaSort: A RISC Machine Sort 1994 SIGMOD 0.00053866006
152 An Evaluation of Non-Equijoin Algorithms 1991 VLDB 0.00040963225
Previous Page 1 / 1 Next

Semantically Similar Papers