Distributed Data Deduplication
Summary: Distributed data deduplication in a shared-nothing setting; leverages parallelism to prune pairwise comparisons after blocking. Dis-Dedup—a distribution strategy that minimizes the maximum per-node workload with theoretical guarantees; experiments on synthetic and real data show scalable speedups. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Xu Chu
- 2. Ihab F. Ilyas
- 3. Paraschos Koutris
Incoming Citations (Sorted by Pagerank)
Showing 15 of 15 citing papers.
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 12 of 12 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 9,581 | Sharing Aggregate Computation for Distributed Queries | 2007 | SIGMOD | 4.3227214e-05 |
| 3,360 | Modeling and Querying Possible Repairs in Duplicate Detection | 2009 | VLDB | 7.1742067e-05 |
| 7,715 | Query Centric Partitioning and Allocation for Partially Replicated Database Systems | 2017 | SIGMOD | 4.6699261e-05 |
| 5,613 | Distributed implementations of dependency discovery algorithms | 2019 | VLDB | 5.4102298e-05 |
| 6,690 | Parallel Discrepancy Detection and Incremental Detection | 2021 | VLDB | 4.9621556e-05 |
| 4,619 | Crowd-Based Deduplication: An Adaptive Approach | 2015 | SIGMOD | 6.0444854e-05 |
| 1,957 | On the Design and Scalability of Distributed Shared-Data Databases | 2015 | SIGMOD | 9.9598319e-05 |
| 3,821 | Locality-aware Partitioning in Parallel Database Systems | 2015 | SIGMOD | 6.7281515e-05 |
| 2,618 | Distributing A Database For Parallelism | 1983 | SIGMOD | 8.4447319e-05 |
| 5,236 | Online Deduplication for Databases | 2017 | SIGMOD | 5.611324e-05 |