Local Similarity Search for Unstructured Text
Summary: Local similarity search for partially replicated text: finds pairs of sliding windows differing by at most tau tokens, not whole documents. Introduces signature-based token combinations on a partitioned token universe, with cost-aware partitioning and overlap-aware reuse to scale to large tau. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Pei Wang
- 2. Chuan Xiao
- 3. Jianbin Qin
- 4. Wei Wang
- 5. Xiaoyang Zhang
- 6. Yoshiharu Ishikawa
Incoming Citations (Sorted by Pagerank)
Showing 7 of 7 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 6,074 | Pigeonring: A Principle for Faster Thresholded Similarity Search | 2019 | VLDB | 5.2242306e-05 |
| 7,635 | Allign: Aligning All-Pair Near-Duplicate Passages in Long Texts | 2021 | SIGMOD | 4.6908858e-05 |
| 7,700 | Near-Duplicate Text Alignment with One Permutation Hashing | 2024 | SIGMOD | 4.6744372e-05 |
| 8,291 | TxtAlign: Efficient Near-Duplicate Text Alignment Search via Bottom-k Sketches for Plagiarism Detection | 2022 | SIGMOD | 4.5435639e-05 |
| 9,876 | Near-Duplicate Sequence Search at Scale for Large Language Model Memorization Evaluation | 2023 | SIGMOD | 4.2667743e-05 |
| 10,266 | Near-Duplicate Text Alignment under Weighted Jaccard Similarity | 2026 | VLDB | 4.1945683e-05 |
| 11,247 | A Two-Level Signature Scheme for Stable Set Similarity Joins | 2023 | VLDB | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 18 of 18 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 3,862 | A Partition-Based Approach to Structure Similarity Search | 2014 | VLDB | 6.687769e-05 |
| 10,266 | Near-Duplicate Text Alignment under Weighted Jaccard Similarity | 2026 | VLDB | 4.1945683e-05 |
| 1,033 | Determining Text Databases to Search in the Internet | 1998 | VLDB | 0.00014543835 |
| 3,514 | Spatio-Textual Similarity Joins | 2013 | VLDB | 7.0226998e-05 |
| 11,504 | LES3: Learning-based Exact Set Similarity Search | 2021 | VLDB | 4.1945683e-05 |
| 7,700 | Near-Duplicate Text Alignment with One Permutation Hashing | 2024 | SIGMOD | 4.6744372e-05 |
| 3,609 | Similarity search in the blink of an eye with compressed indices | 2023 | VLDB | 6.9215236e-05 |
| 3,774 | Efficient Exact Edit Similarity Query Processing with the Asymmetric Signature Scheme | 2011 | SIGMOD | 6.7757301e-05 |
| 8,291 | TxtAlign: Efficient Near-Duplicate Text Alignment Search via Bottom-k Sketches for Plagiarism Detection | 2022 | SIGMOD | 4.5435639e-05 |
| 14,300 | Unstructured Data Bases or Very Efficient Text Searching | 1983 | PODS | - |