Framework for Evaluating Clustering Algorithms in Duplicate Detection
Summary: Introduces Stringer, a framework for evaluating clustering-based duplicate detection with approximate-join scalability. Finds that unconstrained clustering algorithms, when paired with scalable approximate joins, can outperform traditional methods in both accuracy and efficiency. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Oktie Hassanzadeh
- 2. Fei Chiang
- 3. Hyun Chul Lee
- 4. Renée J. Miller
Incoming Citations (Sorted by Pagerank)
Showing 22 of 22 citing papers.
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 7 of 7 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 57 | Discovering Large Dense Subgraphs in Massive Graphs | 2005 | VLDB | 0.00065491112 |
| 250 | Efficient set joins on similarity predicates | 2004 | SIGMOD | 0.00030661988 |
| 266 | Efficient Exact Set-Similarity Joins | 2006 | VLDB | 0.00029718727 |
| 1,202 | VGRAM: Improving Performance of Approximate Queries on String Collections Using Variable-Length Grams | 2007 | VLDB | 0.00013326298 |
| 2,374 | Seeking Stable Clusters in the Blogosphere | 2007 | VLDB | 8.9452874e-05 |
| 3,267 | Benchmarking Declarative Approximate Selection Predicates | 2007 | SIGMOD | 7.3058429e-05 |
| 4,090 | Finding Near Neighbors Through Cluster Pruning | 2007 | PODS | 6.4577834e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 4,383 | Incremental Record Linkage | 2014 | VLDB | 6.2383094e-05 |
| 2,386 | Leveraging Aggregate Constraints For Deduplication | 2007 | SIGMOD | 8.9231895e-05 |
| 6,810 | Record Linkage with Uniqueness Constraints and Erroneous Values | 2010 | VLDB | 4.9203397e-05 |
| 10,624 | Evaluating Methods for Efficient Entity Count Estimation | 2025 | VLDB | 4.1945683e-05 |
| 9,855 | Progressive Entity Matching: A Design Space Exploration | 2025 | SIGMOD | 4.269353e-05 |
| 4,619 | Crowd-Based Deduplication: An Adaptive Approach | 2015 | SIGMOD | 6.0444854e-05 |
| 2,740 | String Similarity Joins: An Experimental Evaluation | 2014 | VLDB | 8.1980628e-05 |
| 322 | Record Linkage: Similarity Measures and Algorithms | 2006 | SIGMOD | 0.00027518768 |
| 280 | Eliminating Fuzzy Duplicates in Data Warehouses | 2002 | VLDB | 0.00029113044 |
| 3,360 | Modeling and Querying Possible Repairs in Duplicate Detection | 2009 | VLDB | 7.1742067e-05 |