Robust and Efficient Fuzzy Match for Online Data Cleaning

Summary: Proposes a novel similarity function addressing limitations of fuzzy-match metrics for data cleaning. Develops an efficient fuzzy-match algorithm for real-time validation/cleansing of incoming tuples against reference tables; demonstrated on real datasets. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID: 3442
Venue: SIGMOD
Year: 2003
Pagerank: 0.00040637896
Overall Rank: 155 | 98.93%
DOI: -

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 7 of 57 citing papers.

Rank	Citing Paper	Year	Venue	Pagerank
10,216	The Case For Language Model Approximated LIKE Predicate	2026	SIGMOD	4.1945683e-05
11,162	Towards Better Bounds for Finding Quasi-Identifiers *	2023	PODS	4.1945683e-05
11,507	TQEL: Framework for Query-Driven Linking of Top-K Entities in Social Media Blogs	2021	VLDB	4.1945683e-05
12,371	Building a Global Location Search Service	2008	SIGMOD	4.1945683e-05
12,461	Bridging the Application and DBMS Profiling Divide for Database Application Developers	2007	VLDB	4.1945683e-05
12,478	Randomized Algorithms for Data Reconciliation in Wide Area Aggregate Query Processing	2007	VLDB	4.1945683e-05
12,544	SPIDER: Flexible Matching in Databases	2005	SIGMOD	4.1945683e-05

Outgoing Citations (Sorted by Pagerank)

Showing 5 of 5 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank	Cited Paper	Year	Venue	Pagerank
67	The Merge/Purge Problem for Large Databases	1995	SIGMOD	0.00061348205
91	M-tree: An Efficient Access Method for Similarity Search in Metric Spaces	1997	VLDB	0.0005181666
125	Approximate String Joins in a Database (Almost) for Free	2001	VLDB	0.00044847972
150	Integration of Heterogeneous Databases Without Common Domains Using Queries Based on Textual Similarity	1998	SIGMOD	0.00041055843
280	Eliminating Fuzzy Duplicates in Data Warehouses	2002	VLDB	0.00029113044

Semantically Similar Papers

Overall Rank	Paper	Year	Venue	Pagerank
266	Efficient Exact Set-Similarity Joins	2006	VLDB	0.00029718727
1,533	Example-driven Design of Efficient Record Matching Queries	2007	VLDB	0.00011471971
11,305	TokenJoin: Efficient Filtering for Set Similarity Join with Maximum Weighted Bipartite Matching	2023	VLDB	4.1945683e-05
7,725	Data Cleaning in Microsoft SQL Server 2005	2005	SIGMOD	4.6670883e-05
5,434	Auto-FuzzyJoin: Auto-Program Fuzzy Similarity Joins Without Labeled Examples	2021	SIGMOD	5.5045402e-05
4,435	Sampling Dirty Data for Matching Attributes	2010	SIGMOD	6.1918164e-05
1,345	Entity Matching: How Similar Is Similar	2011	VLDB	0.00012468408
3,529	Merging the Results of Approximate Match Operations	2004	VLDB	7.0059524e-05
4,026	Flexible String Matching Against Large Databases in Practice	2004	VLDB	6.5169976e-05
280	Eliminating Fuzzy Duplicates in Data Warehouses	2002	VLDB	0.00029113044