Google's Deep-Web Crawl
Summary: Pre-computes HTML-form submissions to surface Deep-Web content and indexes the pages. Algorithms for input-value selection, type-constraint detection, and navigation prune form combinations to avoid Cartesian explosion; experiments validate scalability. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Jayant Madhavan
- 2. David Ko
- 3. Łucja Kot
- 4. Vignesh Ganapathy
- 5. Alex Rasmussen
- 6. Alon Halevy
Incoming Citations (Sorted by Pagerank)
Showing 22 of 22 citing papers.
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 7 of 7 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 208 | Reconciling Schemas of Disparate Data Sources: A Machine-Learning Approach | 2001 | SIGMOD | 0.0003460594 |
| 234 | Crawling the Hidden Web | 2001 | VLDB | 0.00032018108 |
| 291 | Answering Queries Using Templates With Binding Patterns (Extended Abstract) | 1995 | PODS | 0.00028831632 |
| 672 | An Interactive Clustering-based Approach to Integrating Source Query Interfaces on the Deep Web | 2004 | SIGMOD | 0.00018355746 |
| 1,147 | Web-scale Data Integration: You can only afford to Pay As You Go | 2007 | CIDR | 0.00013677658 |
| 1,492 | Distributed Search over the Hidden Web: Hierarchical Database Sampling and Selection | 2002 | VLDB | 0.00011694396 |
| 2,425 | Instance-based Schema Matching for Web Databases by Domain-specific Query Probing | 2004 | VLDB | 8.8376569e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 7,768 | Accurate and Efficient Crawling for Relevant Websites | 2004 | VLDB | 4.6563056e-05 |
| 12,240 | Creating and Exploring Web Form Repositories | 2010 | SIGMOD | 4.1945683e-05 |
| 107 | WebTables: Exploring the Power of Tables on the Web | 2008 | VLDB | 0.00048377684 |
| 11,722 | Deeper: A Data Enrichment System Powered by Deep Web | 2018 | SIGMOD | 4.1945683e-05 |
| 9,433 | Exploration of Deep Web Repositories | 2011 | VLDB | 4.3431757e-05 |
| 3,950 | Probe, Count, and Classify: Categorizing Hidden-Web Databases | 2001 | SIGMOD | 6.5953844e-05 |
| 9,548 | Optimal Algorithms for Crawling a Hidden Database in the Web | 2012 | VLDB | 4.3258142e-05 |
| 234 | Crawling the Hidden Web | 2001 | VLDB | 0.00032018108 |
| 8,678 | Progressive Deep Web Crawling Through Keyword Queries For Data Enrichment | 2019 | SIGMOD | 4.4702119e-05 |
| 4,229 | Harnessing the Deep Web: Present and Future | 2009 | CIDR | 6.3399547e-05 |