Crawling the Hidden Web
Summary: Proposes a hidden Web crawler model and HiWE (Hidden Web Exposer) for extracting content behind search forms and logins. Introduces LITE, a layout-based extraction technique to semantically parse forms and result pages, with experimental validation. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
Incoming Citations (Sorted by Pagerank)
Showing 32 of 32 citing papers.
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 3 of 3 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 409 | Focused Crawling Using Context Graphs | 2000 | VLDB | 0.00023944056 |
| 1,304 | Synchronizing a database to Improve Freshness | 2000 | SIGMOD | 0.00012691283 |
| 6,928 | The Evolution of the Web and Implications for an Incremental Crawler | 2000 | VLDB | 4.8925595e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 5,774 | A Hierarchical Approach to Model Web Query Interfaces for Web Source Integration | 2009 | VLDB | 5.3313642e-05 |
| 13,808 | A Method of Re-ranking Web Search Results Using their Hidden Hyperlink Structure | 2002 | VLDB | - |
| 1,492 | Distributed Search over the Hidden Web: Hierarchical Database Sampling and Selection | 2002 | VLDB | 0.00011694396 |
| 409 | Focused Crawling Using Context Graphs | 2000 | VLDB | 0.00023944056 |
| 4,229 | Harnessing the Deep Web: Present and Future | 2009 | CIDR | 6.3399547e-05 |
| 8,678 | Progressive Deep Web Crawling Through Keyword Queries For Data Enrichment | 2019 | SIGMOD | 4.4702119e-05 |
| 7,768 | Accurate and Efficient Crawling for Relevant Websites | 2004 | VLDB | 4.6563056e-05 |
| 3,950 | Probe, Count, and Classify: Categorizing Hidden-Web Databases | 2001 | SIGMOD | 6.5953844e-05 |
| 1,537 | Google's Deep-Web Crawl | 2008 | VLDB | 0.00011465704 |
| 9,548 | Optimal Algorithms for Crawling a Hidden Database in the Web | 2012 | VLDB | 4.3258142e-05 |