Database Paper Browser

Back to papers

Google's Deep-Web Crawl

Summary: Pre-computes HTML-form submissions to surface Deep-Web content and indexes the pages. Algorithms for input-value selection, type-constraint detection, and navigation prune form combinations to avoid Cartesian explosion; experiments validate scalability. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
9750
Venue
VLDB
Year
2008
Pagerank
0.00011465704
Overall Rank
1,537 | 89.31%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 22 of 22 citing papers.

Rank Citing Paper Year Venue Pagerank
1,317 Harvesting Relational Tables from Lists on the Web 2009 VLDB 0.00012625853
1,527 Generic Schema Matching, Ten Years Later 2011 VLDB 0.00011499442
2,209 Data Integration: After the Teenage Years 2017 PODS 9.2868035e-05
2,420 From Data Fusion to Knowledge Fusion 2014 VLDB 8.8530994e-05
3,478 Transform-Data-by-Example (TDE): An Extensible Search Engine for Data Transformations 2018 VLDB 7.054159e-05
3,985 A First Tutorial on Dataspaces 2008 VLDB 6.5626153e-05
4,229 Harnessing the Deep Web: Present and Future 2009 CIDR 6.3399547e-05
4,695 DataXFormer: An Interactive Data Transformation Tool 2015 SIGMOD 5.9927993e-05
5,937 DataXFormer: Leveraging the Web for Semantic Transformations 2015 CIDR 5.2650964e-05
6,133 DIADEM: Thousands of Websites to a Single Database 2014 VLDB 5.1954702e-05
6,792 Automatically Incorporating New Sources in Keyword Search-Based Data Integration 2010 SIGMOD 4.9249098e-05
8,129 Discovering the Skyline of Web Databases 2016 VLDB 4.5784968e-05
8,678 Progressive Deep Web Crawling Through Keyword Queries For Data Enrichment 2019 SIGMOD 4.4702119e-05
8,696 Effective Entity Augmentation By Querying External Data Sources 2023 VLDB 4.4660032e-05
9,548 Optimal Algorithms for Crawling a Hidden Database in the Web 2012 VLDB 4.3258142e-05
11,883 Query Reranking As A Service 2016 VLDB 4.1945683e-05
12,223 Schema Clustering and Retrieval for Multi-domain Pay-As-You-Go Data Integration Systems 2010 SIGMOD 4.1945683e-05
12,231 Optimizing Content Freshness of Relations Extracted From the Web Using Keyword Search 2010 SIGMOD 4.1945683e-05
12,240 Creating and Exploring Web Form Repositories 2010 SIGMOD 4.1945683e-05
12,301 Privacy Preservation of Aggregates in Hidden Databases: Why and How? 2009 SIGMOD 4.1945683e-05
12,326 Kosmix: High-Performance Topic Exploration using the Deep Web 2009 VLDB 4.1945683e-05
12,349 Answering Web Questions Using Structured Data – Dream or Reality? Panel Discussion 2009 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 7 of 7 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers

Overall Rank Paper Year Venue Pagerank
7,768 Accurate and Efficient Crawling for Relevant Websites 2004 VLDB 4.6563056e-05
12,240 Creating and Exploring Web Form Repositories 2010 SIGMOD 4.1945683e-05
107 WebTables: Exploring the Power of Tables on the Web 2008 VLDB 0.00048377684
11,722 Deeper: A Data Enrichment System Powered by Deep Web 2018 SIGMOD 4.1945683e-05
9,433 Exploration of Deep Web Repositories 2011 VLDB 4.3431757e-05
3,950 Probe, Count, and Classify: Categorizing Hidden-Web Databases 2001 SIGMOD 6.5953844e-05
9,548 Optimal Algorithms for Crawling a Hidden Database in the Web 2012 VLDB 4.3258142e-05
234 Crawling the Hidden Web 2001 VLDB 0.00032018108
8,678 Progressive Deep Web Crawling Through Keyword Queries For Data Enrichment 2019 SIGMOD 4.4702119e-05
4,229 Harnessing the Deep Web: Present and Future 2009 CIDR 6.3399547e-05