Scaling Similarity Joins over Tree-Structured Data
Summary: Dynamic decomposition of tree objects into threshold-guided subgraphs, enabling pruning without computing full tree edit distance when no common subgraph. Two-layer subgraph index yields scalable similarity joins, delivering up to an order-of-magnitude speedup over prior methods on real and synthetic data. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Yu Tang
- 2. Yilun Cai
- 3. Nikos Mamoulis
Incoming Citations (Sorted by Pagerank)
Showing 5 of 5 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 5,615 | A Scalable Index for Top-k Subtree Similarity Queries | 2019 | SIGMOD | 5.4101086e-05 |
| 7,153 | Submodularity of Distributed Join Computation | 2018 | SIGMOD | 4.8153963e-05 |
| 7,215 | SyncSignature: A Simple, Efficient, Parallelizable Framework for Tree Similarity Joins | 2023 | VLDB | 4.7985991e-05 |
| 8,511 | JEDI: These aren't the JSON documents you're looking for... | 2022 | SIGMOD | 4.495029e-05 |
| 10,706 | Extensible and Robust Evaluation of Similarity Queries | 2025 | VLDB | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 6 of 6 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 2,395 | Algebraic Properties of Bag Data Types | 1991 | VLDB | 8.8998019e-05 |
| 2,592 | Pass-Join: A Partition-based Method for Similarity Joins | 2012 | VLDB | 8.4795761e-05 |
| 3,199 | Similarity Evaluation on Tree-structured Data | 2005 | SIGMOD | 7.3927291e-05 |
| 3,301 | RTED: A Robust Algorithm for the Tree Edit Distance | 2012 | VLDB | 7.2515266e-05 |
| 3,862 | A Partition-Based Approach to Structure Similarity Search | 2014 | VLDB | 6.687769e-05 |
| 6,807 | Indexing for Subtree Similarity-Search using Edit Distance | 2013 | SIGMOD | 4.9217776e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 5,273 | Correlating XML Data Streams Using Tree-Edit Distance Embeddings | 2003 | PODS | 5.5913399e-05 |
| 8,899 | Fast Approximate Similarity Join in Vector Databases | 2025 | SIGMOD | 4.427232e-05 |
| 1,733 | Efficient Structural Joins on Indexed XML Documents | 2002 | VLDB | 0.00010724888 |
| 250 | Efficient set joins on similarity predicates | 2004 | SIGMOD | 0.00030661988 |
| 7,109 | Efficient Similarity Join and Search on Multi-Attribute Data | 2015 | SIGMOD | 4.8292998e-05 |
| 7,215 | SyncSignature: A Simple, Efficient, Parallelizable Framework for Tree Similarity Joins | 2023 | VLDB | 4.7985991e-05 |
| 4,216 | Trie-Join: Efficient Trie-based String Similarity Joins with Edit-Distance Constraints | 2010 | VLDB | 6.3521675e-05 |
| 5,615 | A Scalable Index for Top-k Subtree Similarity Queries | 2019 | SIGMOD | 5.4101086e-05 |
| 2,784 | Approximate XML Joins | 2002 | SIGMOD | 8.128931e-05 |
| 3,199 | Similarity Evaluation on Tree-structured Data | 2005 | SIGMOD | 7.3927291e-05 |