Database Paper Browser

Back to papers

Framework for Evaluating Clustering Algorithms in Duplicate Detection

Summary: Introduces Stringer, a framework for evaluating clustering-based duplicate detection with approximate-join scalability. Finds that unconstrained clustering algorithms, when paired with scalable approximate joins, can outperform traditional methods in both accuracy and efficiency. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
9829
Venue
VLDB
Year
2009
Pagerank
0.0001521549
Overall Rank
936 | 93.50%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 22 of 22 citing papers.

Rank Citing Paper Year Venue Pagerank
319 Evaluation of entity resolution approaches on real-world match problems 2010 VLDB 0.00027781866
1,242 Question Selection for Crowd Entity Resolution 2013 VLDB 0.00013096655
2,038 The return of JedAI: End-to-End Entity Resolution for Structured and Semi-Structured Data 2018 VLDB 9.7098952e-05
2,405 Linking Temporal Records 2011 VLDB 8.8815018e-05
3,177 Evaluating Entity Resolution Results 2010 VLDB 7.4367331e-05
3,396 Automatic Data Repair: Are We Ready to Deploy? 2024 VLDB 7.1455126e-05
4,383 Incremental Record Linkage 2014 VLDB 6.2383094e-05
4,607 Data Integration and Machine Learning: A Natural Synergy 2018 SIGMOD 6.0538827e-05
4,619 Crowd-Based Deduplication: An Adaptive Approach 2015 SIGMOD 6.0444854e-05
4,652 On the Efficiency of K-Means Clustering: Evaluation, Optimization, and Algorithm Selection 2021 VLDB 6.0228549e-05
5,586 QuERy: A Framework for Integrating Entity Resolution with Query Processing 2016 VLDB 5.4219548e-05
5,852 Repairing Vertex Labels under Neighborhood Constraints 2014 VLDB 5.3007132e-05
6,754 Modeling Entity Evolution for Temporal Record Matching 2014 SIGMOD 4.9384574e-05
6,894 TableDC: Deep Clustering for Tabular Data 2025 SIGMOD 4.8925595e-05
7,243 Data Integration and Machine Learning: A Natural Synergy 2018 VLDB 4.7913666e-05
7,345 Linking Temporal Records for Profiling Entities 2015 SIGMOD 4.756212e-05
7,706 Tracking Entities in the Dynamic World: A Fast Algorithm for Matching Temporal Records 2014 VLDB 4.6723595e-05
9,855 Progressive Entity Matching: A Design Space Exploration 2025 SIGMOD 4.269353e-05
10,022 In-context Clustering-based Entity Resolution with Large Language Models: A Design Space Exploration 2026 SIGMOD 4.1945683e-05
11,388 Frost: A Platform for Benchmarking and Exploring Data Matching Results 2022 VLDB 4.1945683e-05
12,148 CHRONOS: Facilitating History Discovery by Linking Temporal Records 2012 VLDB 4.1945683e-05
12,194 Web Scale Taxonomy Cleansing 2011 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 7 of 7 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers

Overall Rank Paper Year Venue Pagerank
4,383 Incremental Record Linkage 2014 VLDB 6.2383094e-05
2,386 Leveraging Aggregate Constraints For Deduplication 2007 SIGMOD 8.9231895e-05
6,810 Record Linkage with Uniqueness Constraints and Erroneous Values 2010 VLDB 4.9203397e-05
10,624 Evaluating Methods for Efficient Entity Count Estimation 2025 VLDB 4.1945683e-05
9,855 Progressive Entity Matching: A Design Space Exploration 2025 SIGMOD 4.269353e-05
4,619 Crowd-Based Deduplication: An Adaptive Approach 2015 SIGMOD 6.0444854e-05
2,740 String Similarity Joins: An Experimental Evaluation 2014 VLDB 8.1980628e-05
322 Record Linkage: Similarity Measures and Algorithms 2006 SIGMOD 0.00027518768
280 Eliminating Fuzzy Duplicates in Data Warehouses 2002 VLDB 0.00029113044
3,360 Modeling and Querying Possible Repairs in Duplicate Detection 2009 VLDB 7.1742067e-05