Database Paper Browser

Back to papers

Distributed Data Deduplication

Summary: Distributed data deduplication in a shared-nothing setting; leverages parallelism to prune pairwise comparisons after blocking. Dis-Dedup—a distribution strategy that minimizes the maximum per-node workload with theoretical guarantees; experiments on synthetic and real data show scalable speedups. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
11374
Venue
VLDB
Year
2016
Pagerank
7.0066139e-05
Overall Rank
3,528 | 75.46%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 15 of 15 citing papers.

Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 12 of 12 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers