Winnowing: Local Algorithms for Document Fingerprinting
Summary: Introduces local document fingerprinting—algorithms guaranteed to detect copies in large corpora. Proves a lower bound for any local approach and presents winnowing, an efficient method within 33% of that bound; validates on Web data and discusses MOSS usage. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Saul Schleimer
- 2. Daniel S. Wilkerson
- 3. Alex Aiken
Incoming Citations (Sorted by Pagerank)
Showing 14 of 14 citing papers.
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 1 of 1 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 616 | Copy Detection Mechanisms for Digital Documents | 1995 | SIGMOD | 0.00019108201 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 7,635 | Allign: Aligning All-Pair Near-Duplicate Passages in Long Texts | 2021 | SIGMOD | 4.6908858e-05 |
| 12,276 | Parsimonious Linear Fingerprinting for Time Series | 2010 | VLDB | 4.1945683e-05 |
| 2,489 | Signature files: Design and performance comparison of some signature extraction methods. | 1985 | SIGMOD | 8.6619055e-05 |
| 3,683 | Finding replicated web collections | 2000 | SIGMOD | 6.8477289e-05 |
| 5,094 | Global Detection of Complex Copying Relationships Between Sources | 2010 | VLDB | 5.7023083e-05 |
| 14,300 | Unstructured Data Bases or Very Efficient Text Searching | 1983 | PODS | - |
| 8,291 | TxtAlign: Efficient Near-Duplicate Text Alignment Search via Bottom-k Sketches for Plagiarism Detection | 2022 | SIGMOD | 4.5435639e-05 |
| 12,178 | Large-Scale Copy Detection | 2011 | SIGMOD | 4.1945683e-05 |
| 616 | Copy Detection Mechanisms for Digital Documents | 1995 | SIGMOD | 0.00019108201 |
| 4,250 | Local Similarity Search for Unstructured Text | 2016 | SIGMOD | 6.3241139e-05 |