Leveraging Aggregate Constraints For Deduplication
Summary: Leveraging aggregate constraints (vs. pairwise) to improve cross-source deduplication in data integration. Defines a restricted search space, solves optimally within it, and shows substantial accuracy gains on real data despite semantic and computational challenges. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
Incoming Citations (Sorted by Pagerank)
Showing 8 of 8 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 702 | Reasoning about Record Matching Rules | 2009 | VLDB | 0.00017918203 |
| 3,130 | Behavior Based Record Linkage | 2010 | VLDB | 7.4993061e-05 |
| 3,360 | Modeling and Querying Possible Repairs in Duplicate Detection | 2009 | VLDB | 7.1742067e-05 |
| 3,578 | Efficient Approximate Entity Extraction with Edit Distance Constraints | 2009 | SIGMOD | 6.9503858e-05 |
| 4,375 | Sample Debiasing in the Themis Open World Database System | 2020 | SIGMOD | 6.2427076e-05 |
| 5,887 | Efficient Approximate Search on String Collections (Tutorial) | 2009 | VLDB | 5.2879769e-05 |
| 6,079 | Querying Uncertain Data with Aggregate Constraints | 2011 | SIGMOD | 5.2223439e-05 |
| 6,810 | Record Linkage with Uniqueness Constraints and Erroneous Values | 2010 | VLDB | 4.9203397e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 6 of 6 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 67 | The Merge/Purge Problem for Large Databases | 1995 | SIGMOD | 0.00061348205 |
| 155 | Robust and Efficient Fuzzy Match for Online Data Cleaning | 2003 | SIGMOD | 0.00040637896 |
| 229 | Reference Reconciliation in Complex Information Spaces | 2005 | SIGMOD | 0.00032242633 |
| 265 | A Cost-Based Model and Effective Heuristic for Repairing Constraints by Value Modification | 2005 | SIGMOD | 0.00029763412 |
| 280 | Eliminating Fuzzy Duplicates in Data Warehouses | 2002 | VLDB | 0.00029113044 |
| 322 | Record Linkage: Similarity Measures and Algorithms | 2006 | SIGMOD | 0.00027518768 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 445 | The Magic of Duplicates and Aggregates | 1990 | VLDB | 0.0002294367 |
| 265 | A Cost-Based Model and Effective Heuristic for Repairing Constraints by Value Modification | 2005 | SIGMOD | 0.00029763412 |
| 8,721 | Aggregated Deletion Propagation for Counting Conjunctive Query Answers | 2021 | VLDB | 4.4608778e-05 |
| 3,528 | Distributed Data Deduplication | 2016 | VLDB | 7.0066139e-05 |
| 936 | Framework for Evaluating Clustering Algorithms in Duplicate Detection | 2009 | VLDB | 0.0001521549 |
| 10,277 | Efficient Query Repair for Aggregate Constraints | 2026 | VLDB | 4.1945683e-05 |
| 12,191 | Efficient Rank Join with Aggregation Constraints | 2011 | VLDB | 4.1945683e-05 |
| 4,619 | Crowd-Based Deduplication: An Adaptive Approach | 2015 | SIGMOD | 6.0444854e-05 |
| 3,360 | Modeling and Querying Possible Repairs in Duplicate Detection | 2009 | VLDB | 7.1742067e-05 |
| 2,544 | Aggregation and Relevance in Deductive Databases | 1991 | VLDB | 8.5730083e-05 |