The Evolution of the Web and Implications for an Incremental Crawler
Summary: Incremental, selective updating of crawls to keep index and local collections fresh, not batch refresh. Empirical study over 0.5M pages in 4 months characterizes page evolution, compares crawl strategies, and proposes a hybrid architecture combining best choices for timelier updates. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
Incoming Citations (Sorted by Pagerank)
Showing 4 of 4 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 234 | Crawling the Hidden Web | 2001 | VLDB | 0.00032018108 |
| 8,320 | Effective Change Detection Using Sampling | 2002 | VLDB | 4.5435639e-05 |
| 12,333 | NEAR-Miner: Mining Evolution Associations of Web Site Directories for Efficient Maintenance of Web Archives | 2009 | VLDB | 4.1945683e-05 |
| 13,527 | Dealing with Web Data: History and Look ahead | 2010 | VLDB | - |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 1 of 1 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 1,304 | Synchronizing a database to Improve Freshness | 2000 | SIGMOD | 0.00012691283 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 5,655 | Personalized PageRank on Evolving Graphs with an Incremental Index-Update Scheme | 2023 | SIGMOD | 5.387631e-05 |
| 234 | Crawling the Hidden Web | 2001 | VLDB | 0.00032018108 |
| 12,231 | Optimizing Content Freshness of Relations Extracted From the Web Using Keyword Search | 2010 | SIGMOD | 4.1945683e-05 |
| 9,548 | Optimal Algorithms for Crawling a Hidden Database in the Web | 2012 | VLDB | 4.3258142e-05 |
| 8,678 | Progressive Deep Web Crawling Through Keyword Queries For Data Enrichment | 2019 | SIGMOD | 4.4702119e-05 |
| 409 | Focused Crawling Using Context Graphs | 2000 | VLDB | 0.00023944056 |
| 3,683 | Finding replicated web collections | 2000 | SIGMOD | 6.8477289e-05 |
| 5,442 | RankMass Crawler: A Crawler with High Personalized PageRank Coverage Guarantee | 2007 | VLDB | 5.5026403e-05 |
| 7,768 | Accurate and Efficient Crawling for Relevant Websites | 2004 | VLDB | 4.6563056e-05 |
| 13,527 | Dealing with Web Data: History and Look ahead | 2010 | VLDB | - |