Large-Scale Copy Detection
Summary: Survey of large-scale copy detection across text, images, videos, code, and structured data; contrasting techniques and modalities. Addresses scalability, efficient detection, and open problems for data-management researchers. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
No non-self incoming citations found for this paper in this database.
Authors
Incoming Citations (Sorted by Pagerank)
Showing 1 of 1 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 908 | Fusing Data with Correlations | 2014 | SIGMOD | 0.00015431241 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 7 of 7 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 616 | Copy Detection Mechanisms for Digital Documents | 1995 | SIGMOD | 0.00019108201 |
| 705 | Winnowing: Local Algorithms for Document Fingerprinting | 2003 | SIGMOD | 0.00017864657 |
| 855 | Integrating Conflicting Data: The Role of Source Dependence | 2009 | VLDB | 0.00015906735 |
| 1,246 | Truth Discovery and Copying Detection in a Dynamic World | 2009 | VLDB | 0.0001307161 |
| 5,094 | Global Detection of Complex Copying Relationships Between Sources | 2010 | VLDB | 5.7023083e-05 |
| 5,600 | BibFinder/StatMiner: Effectively Mining and Using Coverage and Overlap Statistics in Data Integration | 2003 | VLDB | 5.4160529e-05 |
| 7,229 | Sailing the Information Ocean with Awareness of Currents: Discovery and Application of Source Dependence | 2009 | CIDR | 4.7950172e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 1,851 | An Analysis of Structured Data on the Web | 2012 | VLDB | 0.00010327871 |
| 8,291 | TxtAlign: Efficient Near-Duplicate Text Alignment Search via Bottom-k Sketches for Plagiarism Detection | 2022 | SIGMOD | 4.5435639e-05 |
| 4,137 | Exploiting Content Redundancy for Web Information Extraction | 2010 | VLDB | 6.4181549e-05 |
| 14,300 | Unstructured Data Bases or Very Efficient Text Searching | 1983 | PODS | - |
| 13,926 | Clustering Methods for Large Databases: From the Past to the Future | 1999 | SIGMOD | - |
| 705 | Winnowing: Local Algorithms for Document Fingerprinting | 2003 | SIGMOD | 0.00017864657 |
| 4,250 | Local Similarity Search for Unstructured Text | 2016 | SIGMOD | 6.3241139e-05 |
| 3,683 | Finding replicated web collections | 2000 | SIGMOD | 6.8477289e-05 |
| 616 | Copy Detection Mechanisms for Digital Documents | 1995 | SIGMOD | 0.00019108201 |
| 5,094 | Global Detection of Complex Copying Relationships Between Sources | 2010 | VLDB | 5.7023083e-05 |