Harnessing the Deep Web: Present and Future
Summary: Google deployment comparing VI (per-site mediated schemas) vs surfacing (precomputed form submissions) to expose Deep Web content. Surfacing scales for web search; VI better for verticals; open problems: routing, form semantics, preserving structure. (summarized by gpt-5-mini on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Jayant Madhavan
- 2. Loredana Afanasiev
- 3. Lyublena Antova
- 4. Alon Halevy
Incoming Citations (Sorted by Pagerank)
Showing 5 of 5 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 1,221 | A Web of Concepts | 2009 | PODS | 0.00013219242 |
| 1,851 | An Analysis of Structured Data on the Web | 2012 | VLDB | 0.00010327871 |
| 2,617 | Extraction and Integration of Partially Overlapping Web Sources | 2013 | VLDB | 8.4462621e-05 |
| 3,678 | Automatic Wrappers for Large Scale Web Extraction | 2011 | VLDB | 6.8517545e-05 |
| 6,586 | Web Data Management | 2011 | SIGMOD | 5.0023398e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 9 of 9 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 107 | WebTables: Exploring the Power of Tables on the Web | 2008 | VLDB | 0.00048377684 |
| 127 | Querying Heterogeneous Information Sources Using Source Descriptions | 1996 | VLDB | 0.00044642203 |
| 234 | Crawling the Hidden Web | 2001 | VLDB | 0.00032018108 |
| 672 | An Interactive Clustering-based Approach to Integrating Source Query Interfaces on the Deep Web | 2004 | SIGMOD | 0.00018355746 |
| 902 | Statistical Schema Matching across Web Query Interfaces | 2003 | SIGMOD | 0.00015486247 |
| 1,147 | Web-scale Data Integration: You can only afford to Pay As You Go | 2007 | CIDR | 0.00013677658 |
| 1,492 | Distributed Search over the Hidden Web: Hierarchical Database Sampling and Selection | 2002 | VLDB | 0.00011694396 |
| 1,537 | Google's Deep-Web Crawl | 2008 | VLDB | 0.00011465704 |
| 1,858 | Bootstrapping Pay-As-You-Go Data Integration Systems | 2008 | SIGMOD | 0.00010301124 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 1,492 | Distributed Search over the Hidden Web: Hierarchical Database Sampling and Selection | 2002 | VLDB | 0.00011694396 |
| 11,722 | Deeper: A Data Enrichment System Powered by Deep Web | 2018 | SIGMOD | 4.1945683e-05 |
| 8,678 | Progressive Deep Web Crawling Through Keyword Queries For Data Enrichment | 2019 | SIGMOD | 4.4702119e-05 |
| 5,774 | A Hierarchical Approach to Model Web Query Interfaces for Web Source Integration | 2009 | VLDB | 5.3313642e-05 |
| 2,095 | Knocking the Door to the Deep Web: Integrating Web Query Interfaces | 2004 | SIGMOD | 9.5505068e-05 |
| 9,433 | Exploration of Deep Web Repositories | 2011 | VLDB | 4.3431757e-05 |
| 12,240 | Creating and Exploring Web Form Repositories | 2010 | SIGMOD | 4.1945683e-05 |
| 234 | Crawling the Hidden Web | 2001 | VLDB | 0.00032018108 |
| 672 | An Interactive Clustering-based Approach to Integrating Source Query Interfaces on the Deep Web | 2004 | SIGMOD | 0.00018355746 |
| 1,537 | Google's Deep-Web Crawl | 2008 | VLDB | 0.00011465704 |