Database Paper Browser

Back to papers

Crawling the Hidden Web

Summary: Proposes a hidden Web crawler model and HiWE (Hidden Web Exposer) for extracting content behind search forms and logins. Introduces LITE, a layout-based extraction technique to semantically parse forms and result pages, with experimental validation. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
8737
Venue
VLDB
Year
2001
Pagerank
0.00032018108
Overall Rank
234 | 98.38%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 32 of 32 citing papers.

Rank Citing Paper Year Venue Pagerank
672 An Interactive Clustering-based Approach to Integrating Source Query Interfaces on the Deep Web 2004 SIGMOD 0.00018355746
1,147 Web-scale Data Integration: You can only afford to Pay As You Go 2007 CIDR 0.00013677658
1,527 Generic Schema Matching, Ten Years Later 2011 VLDB 0.00011499442
1,537 Google's Deep-Web Crawl 2008 VLDB 0.00011465704
1,734 Entity Search Engine: Towards Agile Best-Effort Information Integration over the Web 2007 CIDR 0.00010723542
2,362 Understanding Web Query Interfaces: Best-Effort Parsing with Hidden Syntax 2004 SIGMOD 8.9582251e-05
2,425 Instance-based Schema Matching for Web Databases by Domain-specific Query Probing 2004 VLDB 8.8376569e-05
2,447 WISE-Integrator: An Automatic Integrator of Web Search Interfaces for E-Commerce 2003 VLDB 8.8037197e-05
2,492 Partial Results for Online Query Processing 2002 SIGMOD 8.6526489e-05
2,539 Computing PageRank in a Distributed Internet Search System 2004 VLDB 8.5820857e-05
3,285 Using the Structure of Web Sites for Automatic Segmentation of Tables 2004 SIGMOD 7.2759001e-05
4,229 Harnessing the Deep Web: Present and Future 2009 CIDR 6.3399547e-05
5,140 A Random Walk Approach to Sampling Hidden Databases 2007 SIGMOD 5.668209e-05
5,442 RankMass Crawler: A Crawler with High Personalized PageRank Coverage Guarantee 2007 VLDB 5.5026403e-05
5,774 A Hierarchical Approach to Model Web Query Interfaces for Web Source Integration 2009 VLDB 5.3313642e-05
7,637 Predicate-based Indexing of Enterprise Web Applications 2007 CIDR 4.6905993e-05
8,129 Discovering the Skyline of Web Databases 2016 VLDB 4.5784968e-05
8,460 WISE-Integrator: A System for Extracting and Integrating Complex Web Search Interfaces of the Deep Web 2005 VLDB 4.5061526e-05
8,678 Progressive Deep Web Crawling Through Keyword Queries For Data Enrichment 2019 SIGMOD 4.4702119e-05
8,684 Unbiased Estimation of Size and Other Aggregates Over Hidden Web Databases 2010 SIGMOD 4.4677591e-05
8,878 Learning to Extract Form Labels 2008 VLDB 4.4302126e-05
9,432 Aggregate Estimation Over Dynamic Hidden Web Databases 2014 VLDB 4.3431757e-05
9,548 Optimal Algorithms for Crawling a Hidden Database in the Web 2012 VLDB 4.3258142e-05
9,549 Attribute Domain Discovery for Hidden Web Databases 2011 SIGMOD 4.3258142e-05
11,883 Query Reranking As A Service 2016 VLDB 4.1945683e-05
12,088 Rank Discovery From Web Databases 2013 VLDB 4.1945683e-05
12,189 Randomized Generalization for Aggregate Suppression Over Hidden Web Databases 2011 VLDB 4.1945683e-05
12,231 Optimizing Content Freshness of Relations Extracted From the Web Using Keyword Search 2010 SIGMOD 4.1945683e-05
12,326 Kosmix: High-Performance Topic Exploration using the Deep Web 2009 VLDB 4.1945683e-05
12,525 Automatic Extraction of Dynamic Record Sections From Search Engine Result Pages 2006 VLDB 4.1945683e-05
12,590 An Automatic Data Grabber for Large Web Sites 2004 VLDB 4.1945683e-05
12,634 From Focused Crawling to Expert Information: an Application Framework for Web Exploration and Portal Generation 2003 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 3 of 3 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
409 Focused Crawling Using Context Graphs 2000 VLDB 0.00023944056
1,304 Synchronizing a database to Improve Freshness 2000 SIGMOD 0.00012691283
6,928 The Evolution of the Web and Implications for an Incremental Crawler 2000 VLDB 4.8925595e-05
Previous Page 1 / 1 Next

Semantically Similar Papers