Industry-Scale Duplicate Detection
Summary: DogmatiX, originally a hierarchical XML duplicate detector, scales to an industrial relational DB with Schufa. Targets detection quality and scalability for 60M individuals, addressing false negatives/positives in credit histories, with real-world evaluation. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Melanie Weis
- 2. Felix Naumann
- 3. Ulrich Jehle
- 4. Jens Lufter
- 5. Holger Schuster
Incoming Citations (Sorted by Pagerank)
Showing 2 of 2 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 702 | Reasoning about Record Matching Rules | 2009 | VLDB | 0.00017918203 |
| 7,867 | Learning Over Dirty Data Without Cleaning | 2020 | SIGMOD | 4.6320452e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 4 of 4 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 67 | The Merge/Purge Problem for Large Databases | 1995 | SIGMOD | 0.00061348205 |
| 280 | Eliminating Fuzzy Duplicates in Data Warehouses | 2002 | VLDB | 0.00029113044 |
| 1,533 | Example-driven Design of Efficient Record Matching Queries | 2007 | VLDB | 0.00011471971 |
| 2,589 | DogmatiX Tracks down Duplicates in XML | 2005 | SIGMOD | 8.4847146e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 4,619 | Crowd-Based Deduplication: An Adaptive Approach | 2015 | SIGMOD | 6.0444854e-05 |
| 6,690 | Parallel Discrepancy Detection and Incremental Detection | 2021 | VLDB | 4.9621556e-05 |
| 2,386 | Leveraging Aggregate Constraints For Deduplication | 2007 | SIGMOD | 8.9231895e-05 |
| 3,528 | Distributed Data Deduplication | 2016 | VLDB | 7.0066139e-05 |
| 7,056 | Efficient Discovery of XML Data Redundancies | 2006 | VLDB | 4.8492432e-05 |
| 6,042 | MDedup: Duplicate Detection with Matching Dependencies | 2020 | VLDB | 5.2405269e-05 |
| 936 | Framework for Evaluating Clustering Algorithms in Duplicate Detection | 2009 | VLDB | 0.0001521549 |
| 280 | Eliminating Fuzzy Duplicates in Data Warehouses | 2002 | VLDB | 0.00029113044 |
| 3,360 | Modeling and Querying Possible Repairs in Duplicate Detection | 2009 | VLDB | 7.1742067e-05 |
| 2,589 | DogmatiX Tracks down Duplicates in XML | 2005 | SIGMOD | 8.4847146e-05 |