A Two-Level Signature Scheme for Stable Set Similarity Joins
Summary: TwoL: a two-level hybrid signature framework for exact set similarity joins that detects unselective primary signatures and substitutes more selective secondary signatures to improve pruning. A cost model trades overhead vs selectivity; Jaccard and Hamming implementations give consistent SOTA gains on 13 diverse datasets. (summarized by gpt-5-mini on Feb 09 2026)
Incoming Non-self Citations Over Time
No non-self incoming citations found for this paper in this database.
Authors
- 1. Daniel Schmitt
- 2. Daniel Kocher
- 3. Nikolaus Augsten
- 4. Willi Mann
- 5. Alexander Miller
Incoming Citations (Sorted by Pagerank)
Showing 2 of 2 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 10,245 | SeDA: Bridging the Gap between Efficient Syntactic and Precise Semantic Search of Similar Passages in Large Text Corpora | 2026 | VLDB | 4.1945683e-05 |
| 10,706 | Extensible and Robust Evaluation of Similarity Queries | 2025 | VLDB | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 11 of 11 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 4,808 | On the Complexity of Inner Product Similarity Join | 2016 | PODS | 5.908896e-05 |
| 9,563 | Towards a Unified Framework for String Similarity Joins | 2019 | VLDB | 4.3254416e-05 |
| 6,241 | Scaling Similarity Joins over Tree-Structured Data | 2015 | VLDB | 5.1411469e-05 |
| 11,305 | TokenJoin: Efficient Filtering for Set Similarity Join with Maximum Weighted Bipartite Matching | 2023 | VLDB | 4.1945683e-05 |
| 266 | Efficient Exact Set-Similarity Joins | 2006 | VLDB | 0.00029718727 |
| 250 | Efficient set joins on similarity predicates | 2004 | SIGMOD | 0.00030661988 |
| 7,215 | SyncSignature: A Simple, Efficient, Parallelizable Framework for Tree Similarity Joins | 2023 | VLDB | 4.7985991e-05 |
| 4,353 | Overlap Set Similarity Joins with Theoretical Guarantees | 2018 | SIGMOD | 6.263585e-05 |
| 3,459 | An Empirical Evaluation of Set Similarity Join Techniques | 2016 | VLDB | 7.072508e-05 |
| 3,490 | Leveraging Set Relations in Exact Set Similarity Join | 2017 | VLDB | 7.0465856e-05 |