Database Paper Browser

Back to papers

Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search

Summary: Adaptive framework for similarity join and search that selects per-object prefixes via a cost model, instead of fixed prefix-filtering. Efficient indexes enable dynamic prefix selection, yielding gains vs traditional prefix-filtering baselines. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
4514
Venue
SIGMOD
Year
2012
Pagerank
0.00012204748
Overall Rank
1,396 | 90.29%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 41 of 41 citing papers.

Rank Citing Paper Year Venue Pagerank
1,187 JOSIE: Overlap Set Similarity Search for Finding Joinable Tables in Data Lakes 2019 SIGMOD 0.00013443639
2,641 Locality-Sensitive Hashing for Earthquake Detection: A Case Study of Scaling Data-Driven Science 2018 VLDB 8.3905374e-05
2,730 Open Data Integration 2018 VLDB 8.2126735e-05
2,740 String Similarity Joins: An Experimental Evaluation 2014 VLDB 8.1980628e-05
3,263 QASCA: A Quality-Aware Task Assignment System for Crowdsourcing Applications 2015 SIGMOD 7.3097573e-05
3,459 An Empirical Evaluation of Set Similarity Join Techniques 2016 VLDB 7.072508e-05
3,490 Leveraging Set Relations in Exact Set Similarity Join 2017 VLDB 7.0465856e-05
4,050 An Efficient Partition Based Method for Exact Set Similarity Joins 2016 VLDB 6.4953612e-05
4,250 Local Similarity Search for Unstructured Text 2016 SIGMOD 6.3241139e-05
4,353 Overlap Set Similarity Joins with Theoretical Guarantees 2018 SIGMOD 6.263585e-05
4,684 Approximate String Joins with Abbreviations 2018 VLDB 6.0006406e-05
4,808 On the Complexity of Inner Product Similarity Join 2016 PODS 5.908896e-05
5,151 String Similarity Measures and Joins with Synonyms 2013 SIGMOD 5.6609851e-05
5,365 Question Answering Over Knowledge Graphs: Question Understanding Via Template Decomposition 2018 VLDB 5.5461187e-05
5,434 Auto-FuzzyJoin: Auto-Program Fuzzy Similarity Joins Without Labeled Examples 2021 SIGMOD 5.5045402e-05
5,469 Learned Cardinality Estimation for Similarity Queries 2021 SIGMOD 5.4898192e-05
6,074 Pigeonring: A Principle for Faster Thresholded Similarity Search 2019 VLDB 5.2242306e-05
6,605 Dima: A Distributed In-Memory Similarity-Based Query Processing System 2017 VLDB 4.9965703e-05
6,726 A Pivotal Prefix Based Filtering Algorithm for String Similarity Search 2014 SIGMOD 4.9484027e-05
7,109 Efficient Similarity Join and Search on Multi-Attribute Data 2015 SIGMOD 4.8292998e-05
7,215 SyncSignature: A Simple, Efficient, Parallelizable Framework for Tree Similarity Joins 2023 VLDB 4.7985991e-05
7,588 Scalable Column Concept Determination for Web Tables Using Large Knowledge Bases 2013 VLDB 4.7030914e-05
7,635 Allign: Aligning All-Pair Near-Duplicate Passages in Long Texts 2021 SIGMOD 4.6908858e-05
7,668 Human-in-the-loop Data Integration 2017 VLDB 4.6834075e-05
8,291 TxtAlign: Efficient Near-Duplicate Text Alignment Search via Bottom-k Sketches for Plagiarism Detection 2022 SIGMOD 4.5435639e-05
8,618 Nexus: Correlation Discovery over Collections of Spatio-Temporal Tabular Data 2024 SIGMOD 4.4838259e-05
9,439 On-the-Fly Token Similarity Joins in Relational Databases 2014 SIGMOD 4.3423824e-05
9,563 Towards a Unified Framework for String Similarity Joins 2019 VLDB 4.3254416e-05
9,832 Balance-Aware Distributed String Similarity-Based Query Processing System 2019 VLDB 4.2751057e-05
9,876 Near-Duplicate Sequence Search at Scale for Large Language Model Memorization Evaluation 2023 SIGMOD 4.2667743e-05
9,932 Local Filtering: Improving the Performance of Approximate Queries on String Collections 2015 SIGMOD 4.2500258e-05
9,933 Efficient and Effective KNN Sequence Search with Approximate n-grams 2014 VLDB 4.2500258e-05
10,706 Extensible and Robust Evaluation of Similarity Queries 2025 VLDB 4.1945683e-05
11,087 Dealing with Acronyms, Abbreviations, and Typos in Real-World Entity Matching 2024 VLDB 4.1945683e-05
11,175 Grouping Time Series for Efficient Columnar Storage 2023 SIGMOD 4.1945683e-05
11,247 A Two-Level Signature Scheme for Stable Set Similarity Joins 2023 VLDB 4.1945683e-05
11,305 TokenJoin: Efficient Filtering for Set Similarity Join with Maximum Weighted Bipartite Matching 2023 VLDB 4.1945683e-05
11,347 OpenTFV: An Open Domain Table-Based Fact Verification System 2022 SIGMOD 4.1945683e-05
11,504 LES3: Learning-based Exact Set Similarity Search 2021 VLDB 4.1945683e-05
11,724 ZigZag: Supporting Similarity Queries on Vector Space Models 2018 SIGMOD 4.1945683e-05
12,086 RCSI: Scalable similarity search in thousand(s) of genomes 2013 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 18 of 18 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
125 Approximate String Joins in a Database (Almost) for Free 2001 VLDB 0.00044847972
155 Robust and Efficient Fuzzy Match for Online Data Cleaning 2003 SIGMOD 0.00040637896
250 Efficient set joins on similarity predicates 2004 SIGMOD 0.00030661988
266 Efficient Exact Set-Similarity Joins 2006 VLDB 0.00029718727
447 Efficient Parallel Set-Similarity Joins Using MapReduce 2010 SIGMOD 0.00022900171
1,202 VGRAM: Improving Performance of Approximate Queries on String Collections Using Variable-Length Grams 2007 VLDB 0.00013326298
1,234 Ed-Join: An Efficient Algorithm for Similarity Joins With Edit Distance Constraints 2008 VLDB 0.00013122499
2,024 ATLAS: A Probabilistic Algorithm for High Dimensional Similarity Search 2011 SIGMOD 9.7519678e-05
2,213 n-Gram/2L: A Space and Time Efficient Two-Level n-Gram Inverted Index Structure 2005 VLDB 9.2765152e-05
2,376 Bed-Tree: An All-Purpose Index Structure for String Similarity Search Based on Edit Distance 2010 SIGMOD 8.9424361e-05
2,592 Pass-Join: A Partition-based Method for Similarity Joins 2012 VLDB 8.4795761e-05
2,779 Hashed Samples: Selectivity Estimators For Set Similarity Selection Queries 2008 VLDB 8.1320575e-05
3,774 Efficient Exact Edit Similarity Query Processing with the Asymmetric Signature Scheme 2011 SIGMOD 6.7757301e-05
4,216 Trie-Join: Efficient Trie-based String Similarity Joins with Edit-Distance Constraints 2010 VLDB 6.3521675e-05
4,438 Selectivity Estimation for Fuzzy String Predicates in Large Data Sets 2005 VLDB 6.1898903e-05
4,873 Power-Law Based Estimation of Set Similarity Join Size 2009 VLDB 5.8602304e-05
5,073 Faerie: Efficient Filtering Algorithms for Approximate Dictionary-based Entity Extraction 2011 SIGMOD 5.7177424e-05
5,220 Similarity Join Size Estimation using Locality Sensitive Hashing 2011 VLDB 5.6216111e-05
Previous Page 1 / 1 Next

Semantically Similar Papers