MDedup: Duplicate Detection with Matching Dependencies
Summary: MDedup uses automatically discovered matching dependencies (MDs) for domain-free duplicate detection. A trained model selects MD-based rules from features and gold standards, with boosting to improve recall, achieving up to 94% F-measure and 100% precision. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
Incoming Citations (Sorted by Pagerank)
Showing 11 of 11 citing papers.
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 8 of 8 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 254 | Snorkel: Rapid Training Data Creation with Weak Supervision | 2018 | VLDB | 0.00030540555 |
| 300 | Deep Learning for Entity Matching: A Design Space Exploration | 2018 | SIGMOD | 0.00028441466 |
| 560 | Dependencies Revisited for Improving Data Quality | 2008 | PODS | 0.00020141923 |
| 691 | AJAX: An Extensible Data Cleaning Tool | 2000 | SIGMOD | 0.00018086135 |
| 754 | Distributed Representations of Tuples for Entity Resolution | 2018 | VLDB | 0.00017117211 |
| 1,345 | Entity Matching: How Similar Is Similar | 2011 | VLDB | 0.00012468408 |
| 1,831 | Synthesizing Entity Matching Rules by Examples | 2018 | VLDB | 0.00010384082 |
| 2,038 | The return of JedAI: End-to-End Entity Resolution for Structured and Semi-Structured Data | 2018 | VLDB | 9.7098952e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 9,487 | Making It Tractable to Catch Duplicates and Conflicts in Graphs | 2023 | SIGMOD | 4.3341665e-05 |
| 221 | Deep Entity Matching with Pre-Trained Language Models | 2021 | VLDB | 0.00033121824 |
| 2,386 | Leveraging Aggregate Constraints For Deduplication | 2007 | SIGMOD | 8.9231895e-05 |
| 6,553 | How do Categorical Duplicates Affect ML? A New Benchmark and Empirical Analyses | 2024 | VLDB | 5.0157344e-05 |
| 2,589 | DogmatiX Tracks down Duplicates in XML | 2005 | SIGMOD | 8.4847146e-05 |
| 4,619 | Crowd-Based Deduplication: An Adaptive Approach | 2015 | SIGMOD | 6.0444854e-05 |
| 5,235 | Industry-Scale Duplicate Detection | 2008 | VLDB | 5.6115647e-05 |
| 936 | Framework for Evaluating Clustering Algorithms in Duplicate Detection | 2009 | VLDB | 0.0001521549 |
| 3,360 | Modeling and Querying Possible Repairs in Duplicate Detection | 2009 | VLDB | 7.1742067e-05 |
| 280 | Eliminating Fuzzy Duplicates in Data Warehouses | 2002 | VLDB | 0.00029113044 |