Industry-Scale Duplicate Detection

Summary: DogmatiX, originally a hierarchical XML duplicate detector, scales to an industrial relational DB with Schufa. Targets detection quality and scalability for 60M individuals, addressing false negatives/positives in credit histories, with real-world evaluation. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID: 9752
Venue: VLDB
Year: 2008
Pagerank: 5.6084247e-05
Overall Rank: 5,235 | 63.62%
DOI: -

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 2 of 2 citing papers.

Rank	Citing Paper	Year	Venue	Pagerank
700	Reasoning about Record Matching Rules	2009	VLDB	0.00017927576
7,868	Learning Over Dirty Data Without Cleaning	2020	SIGMOD	4.6276013e-05

Outgoing Citations (Sorted by Pagerank)

Showing 4 of 4 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank	Cited Paper	Year	Venue	Pagerank
67	The Merge/Purge Problem for Large Databases	1995	SIGMOD	0.00061419648
279	Eliminating Fuzzy Duplicates in Data Warehouses	2002	VLDB	0.00029141798
1,530	Example-driven Design of Efficient Record Matching Queries	2007	VLDB	0.00011483613
2,591	DogmatiX Tracks down Duplicates in XML	2005	SIGMOD	8.4851409e-05

Semantically Similar Papers

Overall Rank	Paper	Year	Venue	Pagerank
4,617	Crowd-Based Deduplication: An Adaptive Approach	2015	SIGMOD	6.0400801e-05
2,385	Leveraging Aggregate Constraints For Deduplication	2007	SIGMOD	8.9167648e-05
6,691	Parallel Discrepancy Detection and Incremental Detection	2021	VLDB	4.9573939e-05
3,526	Distributed Data Deduplication	2016	VLDB	7.0056559e-05
7,052	Efficient Discovery of XML Data Redundancies	2006	VLDB	4.8445913e-05
6,047	MDedup: Duplicate Detection with Matching Dependencies	2020	VLDB	5.2355891e-05
942	Framework for Evaluating Clustering Algorithms in Duplicate Detection	2009	VLDB	0.00015143877
279	Eliminating Fuzzy Duplicates in Data Warehouses	2002	VLDB	0.00029141798
3,366	Modeling and Querying Possible Repairs in Duplicate Detection	2009	VLDB	7.1671634e-05
2,591	DogmatiX Tracks down Duplicates in XML	2005	SIGMOD	8.4851409e-05