TxtAlign: Efficient Near-Duplicate Text Alignment Search via Bottom-k Sketches for Plagiarism Detection
Summary: TxtAlign uses bottom-k sketches to estimate passage similarity, grouping O(n^2) passages into O(nk) sketch-based groups. Enables corpus-scale source retrieval: near-duplicate passage pairs found via cross-group sketches; grouping in O(n log n + nk). (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Zhizhi Wang
- 2. Chaoji Zuo
- 3. Dong Deng
Incoming Citations (Sorted by Pagerank)
Showing 4 of 4 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 7,700 | Near-Duplicate Text Alignment with One Permutation Hashing | 2024 | SIGMOD | 4.6744372e-05 |
| 9,876 | Near-Duplicate Sequence Search at Scale for Large Language Model Memorization Evaluation | 2023 | SIGMOD | 4.2667743e-05 |
| 10,245 | SeDA: Bridging the Gap between Efficient Syntactic and Precise Semantic Search of Similar Passages in Large Text Corpora | 2026 | VLDB | 4.1945683e-05 |
| 10,266 | Near-Duplicate Text Alignment under Weighted Jaccard Similarity | 2026 | VLDB | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 13 of 13 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
Previous
Page 1 / 1
Next