Database Paper Browser

Back to papers

PACk: An Efficient Partition-based Distributed Agglomerative Hierarchical Clustering Algorithm for Deduplication

Summary: Partition-based distributed AHC for deduplication, with distance-based partitioning and distance-aware merging to scale out. Spark implementation delivers 2x to 19x speedup vs. state-of-the-art distributed AHC (median 9x) on real and synthetic datasets. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
12625
Venue
VLDB
Year
2022
Pagerank
4.1945683e-05
Overall Rank
11,368 | 20.92%
DOI
10.14778/3514061.3514062

Incoming Non-self Citations Over Time

No non-self incoming citations found for this paper in this database.

Authors

Incoming Citations (Sorted by Pagerank)

Showing 0 of 0 citing papers.

Rank Citing Paper Year Venue Pagerank
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 4 of 4 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers