Database Paper Browser

Back to papers

Industry-Scale Duplicate Detection

Summary: DogmatiX, originally a hierarchical XML duplicate detector, scales to an industrial relational DB with Schufa. Targets detection quality and scalability for 60M individuals, addressing false negatives/positives in credit histories, with real-world evaluation. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
9751
Venue
VLDB
Year
2008
Pagerank
5.6115647e-05
Overall Rank
5,235 | 63.59%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 2 of 2 citing papers.

Rank Citing Paper Year Venue Pagerank
702 Reasoning about Record Matching Rules 2009 VLDB 0.00017918203
7,867 Learning Over Dirty Data Without Cleaning 2020 SIGMOD 4.6320452e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 4 of 4 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
67 The Merge/Purge Problem for Large Databases 1995 SIGMOD 0.00061348205
280 Eliminating Fuzzy Duplicates in Data Warehouses 2002 VLDB 0.00029113044
1,533 Example-driven Design of Efficient Record Matching Queries 2007 VLDB 0.00011471971
2,589 DogmatiX Tracks down Duplicates in XML 2005 SIGMOD 8.4847146e-05
Previous Page 1 / 1 Next

Semantically Similar Papers

Overall Rank Paper Year Venue Pagerank
4,619 Crowd-Based Deduplication: An Adaptive Approach 2015 SIGMOD 6.0444854e-05
6,690 Parallel Discrepancy Detection and Incremental Detection 2021 VLDB 4.9621556e-05
2,386 Leveraging Aggregate Constraints For Deduplication 2007 SIGMOD 8.9231895e-05
3,528 Distributed Data Deduplication 2016 VLDB 7.0066139e-05
7,056 Efficient Discovery of XML Data Redundancies 2006 VLDB 4.8492432e-05
6,042 MDedup: Duplicate Detection with Matching Dependencies 2020 VLDB 5.2405269e-05
936 Framework for Evaluating Clustering Algorithms in Duplicate Detection 2009 VLDB 0.0001521549
280 Eliminating Fuzzy Duplicates in Data Warehouses 2002 VLDB 0.00029113044
3,360 Modeling and Querying Possible Repairs in Duplicate Detection 2009 VLDB 7.1742067e-05
2,589 DogmatiX Tracks down Duplicates in XML 2005 SIGMOD 8.4847146e-05