Database Paper Browser

Back to papers

Approximate String Joins in a Database (Almost) for Free

Summary: Proposes approximate string joins on commercial DBs via q-grams; encodes match positions and counts, rewriting the predicate as a relational expression. Shows gains vs UDFs for full-string and substring joins; validated on real data and a prototype. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
8773
Venue
VLDB
Year
2001
Pagerank
0.00044847972
Overall Rank
125 | 99.14%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 50 of 77 citing papers.

Rank Citing Paper Year Venue Pagerank
155 Robust and Efficient Fuzzy Match for Online Data Cleaning 2003 SIGMOD 0.00040637896
199 Declarative Data Cleaning: Language, Model, and Algorithms 2001 VLDB 0.00035041015
250 Efficient set joins on similarity predicates 2004 SIGMOD 0.00030661988
251 Robust and Fast Similarity Search for Moving Object Trajectories 2005 SIGMOD 0.00030644658
266 Efficient Exact Set-Similarity Joins 2006 VLDB 0.00029718727
280 Eliminating Fuzzy Duplicates in Data Warehouses 2002 VLDB 0.00029113044
447 Efficient Parallel Set-Similarity Joins Using MapReduce 2010 SIGMOD 0.00022900171
475 Mining Database Structure; Or, How to Build a Data Quality Browser 2002 SIGMOD 0.00022303253
509 On Active Learning of Record Matching Packages 2010 SIGMOD 0.00021409518
1,202 VGRAM: Improving Performance of Approximate Queries on String Collections Using Variable-Length Grams 2007 VLDB 0.00013326298
1,234 Ed-Join: An Efficient Algorithm for Similarity Joins With Edit Distance Constraints 2008 VLDB 0.00013122499
1,395 Structured Querying of Web Text: A Technical Challenge 2007 CIDR 0.00012207039
1,396 Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search 2012 SIGMOD 0.00012204748
1,533 Example-driven Design of Efficient Record Matching Queries 2007 VLDB 0.00011471971
2,073 Extending Autocompletion To Tolerate Errors 2009 SIGMOD 9.6142791e-05
2,193 Cost-Based Variable-Length-Gram Selection for String Collections to Support Approximate Queries Efficiently 2008 SIGMOD 9.3178557e-05
2,376 Bed-Tree: An All-Purpose Index Structure for String Similarity Search Based on Edit Distance 2010 SIGMOD 8.9424361e-05
2,514 Comparative Analysis of Approximate Blocking Techniques for Entity Resolution 2016 VLDB 8.6139012e-05
2,592 Pass-Join: A Partition-based Method for Similarity Joins 2012 VLDB 8.4795761e-05
2,740 String Similarity Joins: An Experimental Evaluation 2014 VLDB 8.1980628e-05
2,779 Hashed Samples: Selectivity Estimators For Set Similarity Selection Queries 2008 VLDB 8.1320575e-05
2,784 Approximate XML Joins 2002 SIGMOD 8.128931e-05
3,140 ZeroER: Entity Resolution using Zero Labeled Examples 2020 SIGMOD 7.4841763e-05
3,199 Similarity Evaluation on Tree-structured Data 2005 SIGMOD 7.3927291e-05
3,226 Extending Q-Grams to Estimate Selectivity of String Matching with Low Edit Distance 2007 VLDB 7.3433307e-05
3,267 Benchmarking Declarative Approximate Selection Predicates 2007 SIGMOD 7.3058429e-05
3,514 Spatio-Textual Similarity Joins 2013 VLDB 7.0226998e-05
3,529 Merging the Results of Approximate Match Operations 2004 VLDB 7.0059524e-05
3,578 Efficient Approximate Entity Extraction with Edit Distance Constraints 2009 SIGMOD 6.9503858e-05
3,774 Efficient Exact Edit Similarity Query Processing with the Asymmetric Signature Scheme 2011 SIGMOD 6.7757301e-05
3,868 An Efficient Filter for Approximate Membership Checking 2008 SIGMOD 6.6822543e-05
3,977 BLAST: a Loosely Schema-aware Meta-blocking Approach for Entity Resolution 2016 VLDB 6.5736268e-05
4,216 Trie-Join: Efficient Trie-based String Similarity Joins with Edit-Distance Constraints 2010 VLDB 6.3521675e-05
4,402 Smurf: Self-Service String Matching Using Random Forests 2019 VLDB 6.2195162e-05
4,406 Approximate Matching of Hierarchical Data Using pq-Grams 2005 VLDB 6.2141638e-05
4,435 Sampling Dirty Data for Matching Attributes 2010 SIGMOD 6.1918164e-05
4,438 Selectivity Estimation for Fuzzy String Predicates in Large Data Sets 2005 VLDB 6.1898903e-05
4,684 Approximate String Joins with Abbreviations 2018 VLDB 6.0006406e-05
4,901 Probabilistic String Similarity Joins 2010 SIGMOD 5.8411648e-05
4,974 Supervised Meta-blocking 2014 VLDB 5.7903293e-05
4,988 Incremental Maintenance of Length Normalized Indexes for Approximate String Matching 2009 SIGMOD 5.783959e-05
4,989 BEER: Blocking for Effective Entity Resolution 2021 SIGMOD 5.7827362e-05
5,073 Faerie: Efficient Filtering Algorithms for Approximate Dictionary-based Entity Extraction 2011 SIGMOD 5.7177424e-05
5,151 String Similarity Measures and Joins with Synonyms 2013 SIGMOD 5.6609851e-05
5,228 Schema-agnostic vs Schema-based Configurations for Blocking Methods on Homogeneous Data 2016 VLDB 5.6158315e-05
5,232 SEAL: Spatio-Textual Similarity Search 2012 VLDB 5.6136151e-05
5,291 Fast Subtrajectory Similarity Search in Road Networks under Weighted Edit Distance Constraints 2020 VLDB 5.5826473e-05
5,536 On Indexing Error-Tolerant Set Containment 2010 SIGMOD 5.4532734e-05
5,794 Discovering Related Data At Scale 2021 VLDB 5.3245122e-05
5,887 Efficient Approximate Search on String Collections (Tutorial) 2009 VLDB 5.2879769e-05
Previous Page 1 / 2 Next

Outgoing Citations (Sorted by Pagerank)

Showing 4 of 4 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers