Accurate and Efficient Crawling for Relevant Websites
Summary: Two-level focused website crawler with graph-based external site selection and per-site focused page crawling. Models websites as first-class units and beats prior site-adapted focused crawlers by efficient, targeted intra-site discovery. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
Incoming Citations (Sorted by Pagerank)
Showing 1 of 1 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 5,442 | RankMass Crawler: A Crawler with High Personalized PageRank Coverage Guarantee | 2007 | VLDB | 5.5026403e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 3 of 3 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 409 | Focused Crawling Using Context Graphs | 2000 | VLDB | 0.00023944056 |
| 771 | Distributed Hypertext Resource Discovery Through Examples | 1999 | VLDB | 0.00016887664 |
| 1,606 | Enhanced hypertext categorization using hyperlinks | 1998 | SIGMOD | 0.00011174873 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 1,492 | Distributed Search over the Hidden Web: Hierarchical Database Sampling and Selection | 2002 | VLDB | 0.00011694396 |
| 1,537 | Google's Deep-Web Crawl | 2008 | VLDB | 0.00011465704 |
| 12,158 | Efficient Verification of Web-Content Searching Through Authenticated Web Crawlers | 2012 | VLDB | 4.1945683e-05 |
| 12,615 | The BINGO! System for Information Portal Generation and Expert Web Search | 2003 | CIDR | 4.1945683e-05 |
| 9,548 | Optimal Algorithms for Crawling a Hidden Database in the Web | 2012 | VLDB | 4.3258142e-05 |
| 5,442 | RankMass Crawler: A Crawler with High Personalized PageRank Coverage Guarantee | 2007 | VLDB | 5.5026403e-05 |
| 6,928 | The Evolution of the Web and Implications for an Incremental Crawler | 2000 | VLDB | 4.8925595e-05 |
| 8,678 | Progressive Deep Web Crawling Through Keyword Queries For Data Enrichment | 2019 | SIGMOD | 4.4702119e-05 |
| 234 | Crawling the Hidden Web | 2001 | VLDB | 0.00032018108 |
| 409 | Focused Crawling Using Context Graphs | 2000 | VLDB | 0.00023944056 |