Database Paper Browser

Back to papers

Near-Duplicate Text Alignment with One Permutation Hashing

Summary: OPH-based compact windows compress all O(n^2 k) min-hashes to O(n+k) space for near-duplicate text alignment under Jaccard, avoiding enumeration. An efficient algorithm derives all query-similar sketches directly from OPH compact windows with three optimizations, reducing index cost and query latency on real datasets. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
6966
Venue
SIGMOD
Year
2024
Pagerank
4.6744372e-05
Overall Rank
7,700 | 46.44%
DOI
10.1145/3677136

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 3 of 3 citing papers.

Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 16 of 16 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
34 Similarity Search in High Dimensions via Hashing 1999 VLDB 0.00076637636
400 Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search 2007 VLDB 0.0002427237
562 Query-Aware Locality-Sensitive Hashing for Approximate Nearest Neighbor Search 2016 VLDB 0.00020091752
605 Locality-Sensitive Hashing Scheme Based on Dynamic Collision Counting 2012 SIGMOD 0.000193396
616 Copy Detection Mechanisms for Digital Documents 1995 SIGMOD 0.00019108201
682 Quality and Efficiency in High Dimensional Nearest Neighbor Search 2009 SIGMOD 0.00018201541
705 Winnowing: Local Algorithms for Document Fingerprinting 2003 SIGMOD 0.00017864657
867 SRS: Solving c-Approximate Nearest Neighbor Queries in High Dimensional Euclidean Space with a Tiny Index 2015 VLDB 0.00015792021
1,234 Ed-Join: An Efficient Algorithm for Similarity Joins With Edit Distance Constraints 2008 VLDB 0.00013122499
2,592 Pass-Join: A Partition-based Method for Similarity Joins 2012 VLDB 8.4795761e-05
3,490 Leveraging Set Relations in Exact Set Similarity Join 2017 VLDB 7.0465856e-05
4,250 Local Similarity Search for Unstructured Text 2016 SIGMOD 6.3241139e-05
5,073 Faerie: Efficient Filtering Algorithms for Approximate Dictionary-based Entity Extraction 2011 SIGMOD 5.7177424e-05
7,635 Allign: Aligning All-Pair Near-Duplicate Passages in Long Texts 2021 SIGMOD 4.6908858e-05
8,291 TxtAlign: Efficient Near-Duplicate Text Alignment Search via Bottom-k Sketches for Plagiarism Detection 2022 SIGMOD 4.5435639e-05
9,876 Near-Duplicate Sequence Search at Scale for Large Language Model Memorization Evaluation 2023 SIGMOD 4.2667743e-05
Previous Page 1 / 1 Next

Semantically Similar Papers