Database Paper Browser

Back to papers

Local Similarity Search for Unstructured Text

Summary: Local similarity search for partially replicated text: finds pairs of sliding windows differing by at most tau tokens, not whole documents. Introduces signature-based token combinations on a partitioned token universe, with cost-aware partitioning and overlap-aware reuse to scale to large tau. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
5238
Venue
SIGMOD
Year
2016
Pagerank
6.3241139e-05
Overall Rank
4,250 | 70.44%
DOI
10.1145/2882903.2915211

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 7 of 7 citing papers.

Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 18 of 18 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
34 Similarity Search in High Dimensions via Hashing 1999 VLDB 0.00076637636
250 Efficient set joins on similarity predicates 2004 SIGMOD 0.00030661988
266 Efficient Exact Set-Similarity Joins 2006 VLDB 0.00029718727
616 Copy Detection Mechanisms for Digital Documents 1995 SIGMOD 0.00019108201
705 Winnowing: Local Algorithms for Document Fingerprinting 2003 SIGMOD 0.00017864657
1,246 Truth Discovery and Copying Detection in a Dynamic World 2009 VLDB 0.0001307161
1,305 Bayesian Locality Sensitive Hashing for Fast Similarity Search 2012 VLDB 0.00012687101
1,396 Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search 2012 SIGMOD 0.00012204748
2,592 Pass-Join: A Partition-based Method for Similarity Joins 2012 VLDB 8.4795761e-05
2,740 String Similarity Joins: An Experimental Evaluation 2014 VLDB 8.1980628e-05
3,774 Efficient Exact Edit Similarity Query Processing with the Asymmetric Signature Scheme 2011 SIGMOD 6.7757301e-05
3,868 An Efficient Filter for Approximate Membership Checking 2008 SIGMOD 6.6822543e-05
5,094 Global Detection of Complex Copying Relationships Between Sources 2010 VLDB 5.7023083e-05
5,379 Scalable Ad-hoc Entity Extraction from Text Collections 2008 VLDB 5.5405989e-05
5,536 On Indexing Error-Tolerant Set Containment 2010 SIGMOD 5.4532734e-05
6,726 A Pivotal Prefix Based Filtering Algorithm for String Similarity Search 2014 SIGMOD 4.9484027e-05
7,109 Efficient Similarity Join and Search on Multi-Attribute Data 2015 SIGMOD 4.8292998e-05
9,932 Local Filtering: Improving the Performance of Approximate Queries on String Collections 2015 SIGMOD 4.2500258e-05
Previous Page 1 / 1 Next

Semantically Similar Papers