Focused Crawling Using Context Graphs
Summary: Introduces a focused crawler that builds a context graph to locate topically relevant pages and the content that co-occurs with them, improving credit assignment along crawl paths. Leverages partial reverse crawling from large search engines to yield significant efficiency gains over standard focused crawling. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. M. Diligenti
- 2. F. M. Coetzee
- 3. S. Lawrence
- 4. C. L. Giles
- 5. M. Gori
Incoming Citations (Sorted by Pagerank)
Showing 5 of 5 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 234 | Crawling the Hidden Web | 2001 | VLDB | 0.00032018108 |
| 759 | To Search or to Crawl? Towards a Query Optimizer for Text-Centric Tasks | 2006 | SIGMOD | 0.00017064615 |
| 5,442 | RankMass Crawler: A Crawler with High Personalized PageRank Coverage Guarantee | 2007 | VLDB | 5.5026403e-05 |
| 7,768 | Accurate and Efficient Crawling for Relevant Websites | 2004 | VLDB | 4.6563056e-05 |
| 7,899 | Sideway Value Algebra for Object-Relational Databases | 2002 | VLDB | 4.6226726e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 1 of 1 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 771 | Distributed Hypertext Resource Discovery Through Examples | 1999 | VLDB | 0.00016887664 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 9,548 | Optimal Algorithms for Crawling a Hidden Database in the Web | 2012 | VLDB | 4.3258142e-05 |
| 1,492 | Distributed Search over the Hidden Web: Hierarchical Database Sampling and Selection | 2002 | VLDB | 0.00011694396 |
| 1,537 | Google's Deep-Web Crawl | 2008 | VLDB | 0.00011465704 |
| 6,928 | The Evolution of the Web and Implications for an Incremental Crawler | 2000 | VLDB | 4.8925595e-05 |
| 12,615 | The BINGO! System for Information Portal Generation and Expert Web Search | 2003 | CIDR | 4.1945683e-05 |
| 5,442 | RankMass Crawler: A Crawler with High Personalized PageRank Coverage Guarantee | 2007 | VLDB | 5.5026403e-05 |
| 234 | Crawling the Hidden Web | 2001 | VLDB | 0.00032018108 |
| 8,678 | Progressive Deep Web Crawling Through Keyword Queries For Data Enrichment | 2019 | SIGMOD | 4.4702119e-05 |
| 759 | To Search or to Crawl? Towards a Query Optimizer for Text-Centric Tasks | 2006 | SIGMOD | 0.00017064615 |
| 7,768 | Accurate and Efficient Crawling for Relevant Websites | 2004 | VLDB | 4.6563056e-05 |