Modeling and Querying Possible Repairs in Duplicate Detection
Summary: Models duplicate detection as uncertain outcomes over parameterized clustering; proposes a compact uncertainty model for the space of possible repairs. Supports efficient relational queries on repairs and new query types; experiments show scalability. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. George Beskales
- 2. Mohamed A. Soliman
- 3. Ihab F. Ilyas
- 4. Shai Ben-David
Incoming Citations (Sorted by Pagerank)
Showing 9 of 9 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 1,627 | Data Cleaning: Overview and Emerging Challenges | 2016 | SIGMOD | 0.00011086905 |
| 2,566 | Database Repairs and Consistent Query Answering: Origins and Further Developments | 2019 | PODS | 8.5243847e-05 |
| 2,823 | Interaction between Record Matching and Data Repairing | 2011 | SIGMOD | 8.0593894e-05 |
| 3,192 | Towards Dependable Data Repairing with Fixing Rules | 2014 | SIGMOD | 7.4095761e-05 |
| 6,670 | Explore or Exploit? Effective Strategies for Disambiguating Large Databases | 2010 | VLDB | 4.9672601e-05 |
| 6,705 | Consistent Query Answers in Inconsistent Probabilistic Databases | 2010 | SIGMOD | 4.9549359e-05 |
| 7,013 | Qualitative Data Cleaning | 2016 | VLDB | 4.8619024e-05 |
| 7,952 | Multi-Source Uncertain Entity Resolution at Yad Vashem: Transforming Holocaust Victim Reports into People | 2016 | SIGMOD | 4.613363e-05 |
| 12,002 | Tutorial: Uncertain Entity Resolution — Re-evaluating Entity Resolution in the Big Data Era | 2014 | VLDB | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 5 of 5 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 49 | Consistent Query Answers in Inconsistent Databases | 1999 | PODS | 0.00067660624 |
| 199 | Declarative Data Cleaning: Language, Model, and Algorithms | 2001 | VLDB | 0.00035041015 |
| 341 | CURE: An Efficient Clustering Algorithm for Large Databases | 1998 | SIGMOD | 0.00026810548 |
| 827 | On the Representation and Querying of Sets of Possible Worlds | 1987 | SIGMOD | 0.00016220185 |
| 2,386 | Leveraging Aggregate Constraints For Deduplication | 2007 | SIGMOD | 8.9231895e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 3,528 | Distributed Data Deduplication | 2016 | VLDB | 7.0066139e-05 |
| 1,612 | Detecting Data Errors: Where are we and what needs to be done? | 2016 | VLDB | 0.00011142794 |
| 6,042 | MDedup: Duplicate Detection with Matching Dependencies | 2020 | VLDB | 5.2405269e-05 |
| 2,386 | Leveraging Aggregate Constraints For Deduplication | 2007 | SIGMOD | 8.9231895e-05 |
| 2,823 | Interaction between Record Matching and Data Repairing | 2011 | SIGMOD | 8.0593894e-05 |
| 5,235 | Industry-Scale Duplicate Detection | 2008 | VLDB | 5.6115647e-05 |
| 265 | A Cost-Based Model and Effective Heuristic for Repairing Constraints by Value Modification | 2005 | SIGMOD | 0.00029763412 |
| 623 | Improving Data Quality: Consistency and Accuracy | 2007 | VLDB | 0.00018996374 |
| 280 | Eliminating Fuzzy Duplicates in Data Warehouses | 2002 | VLDB | 0.00029113044 |
| 936 | Framework for Evaluating Clustering Algorithms in Duplicate Detection | 2009 | VLDB | 0.0001521549 |