RankMass Crawler: A Crawler with High Personalized PageRank Coverage Guarantee
Summary: RankMass Crawler family uses Personalized PageRank to bound the portion of 'important' Web pages crawled under a budget. The approach delivers high early coverage with a theoretical guarantee and strong empirical validation on 141M URLs. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Junghoo Cho
- 2. Uri Schonfeld
Incoming Citations (Sorted by Pagerank)
Showing 2 of 2 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 2,098 | Efficient Ad-hoc Search for Personalized PageRank | 2013 | SIGMOD | 9.5480012e-05 |
| 12,334 | SHARC: Framework for Quality-Conscious Web Archiving | 2009 | VLDB | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 6 of 6 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 234 | Crawling the Hidden Web | 2001 | VLDB | 0.00032018108 |
| 409 | Focused Crawling Using Context Graphs | 2000 | VLDB | 0.00023944056 |
| 2,539 | Computing PageRank in a Distributed Internet Search System | 2004 | VLDB | 8.5820857e-05 |
| 2,564 | Combating Web Spam with TrustRank | 2004 | VLDB | 8.5277793e-05 |
| 7,767 | Efficient and Decentralized PageRank Approximation in a Peer-to-Peer Web Search Network | 2006 | VLDB | 4.6563056e-05 |
| 7,768 | Accurate and Efficient Crawling for Relevant Websites | 2004 | VLDB | 4.6563056e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 1,821 | Computing Personalized PageRank Quickly by Exploiting Graph Structures | 2014 | VLDB | 0.00010423565 |
| 7,718 | Approximating Aggregate Queries about Web Pages via Random Walks | 2000 | VLDB | 4.6688065e-05 |
| 3,091 | Optimized Query Execution in Large Search Engines with Global Page Ordering | 2003 | VLDB | 7.5805947e-05 |
| 5,501 | Page Quality: In Search of an Unbiased Web Ranking | 2005 | SIGMOD | 5.4742188e-05 |
| 234 | Crawling the Hidden Web | 2001 | VLDB | 0.00032018108 |
| 9,548 | Optimal Algorithms for Crawling a Hidden Database in the Web | 2012 | VLDB | 4.3258142e-05 |
| 6,928 | The Evolution of the Web and Implications for an Incremental Crawler | 2000 | VLDB | 4.8925595e-05 |
| 409 | Focused Crawling Using Context Graphs | 2000 | VLDB | 0.00023944056 |
| 7,768 | Accurate and Efficient Crawling for Relevant Websites | 2004 | VLDB | 4.6563056e-05 |
| 2,539 | Computing PageRank in a Distributed Internet Search System | 2004 | VLDB | 8.5820857e-05 |