Optimal Algorithms for Crawling a Hidden Database in the Web
Summary: Algorithms to extract all tuples from a hidden web database via a query-only interface, even when results are partial. Provably efficient in the worst case and asymptotically optimal, with extensive experiments on real datasets. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Cheng Sheng
- 2. Nan Zhang
- 3. Yufei Tao
- 4. Xin Jin
Incoming Citations (Sorted by Pagerank)
Showing 6 of 6 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 8,129 | Discovering the Skyline of Web Databases | 2016 | VLDB | 4.5784968e-05 |
| 8,678 | Progressive Deep Web Crawling Through Keyword Queries For Data Enrichment | 2019 | SIGMOD | 4.4702119e-05 |
| 9,432 | Aggregate Estimation Over Dynamic Hidden Web Databases | 2014 | VLDB | 4.3431757e-05 |
| 11,883 | Query Reranking As A Service | 2016 | VLDB | 4.1945683e-05 |
| 12,088 | Rank Discovery From Web Databases | 2013 | VLDB | 4.1945683e-05 |
| 13,411 | HDBTracker: Monitoring the Aggregates On Dynamic Hidden Web Databases | 2014 | VLDB | - |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 10 of 10 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 234 | Crawling the Hidden Web | 2001 | VLDB | 0.00032018108 |
| 902 | Statistical Schema Matching across Web Query Interfaces | 2003 | SIGMOD | 0.00015486247 |
| 1,492 | Distributed Search over the Hidden Web: Hierarchical Database Sampling and Selection | 2002 | VLDB | 0.00011694396 |
| 1,537 | Google's Deep-Web Crawl | 2008 | VLDB | 0.00011465704 |
| 2,362 | Understanding Web Query Interfaces: Best-Effort Parsing with Hidden Syntax | 2004 | SIGMOD | 8.9582251e-05 |
| 2,813 | Mining Search Engine Query Logs via Suggestion Sampling | 2008 | VLDB | 8.0773142e-05 |
| 5,774 | A Hierarchical Approach to Model Web Query Interfaces for Web Source Integration | 2009 | VLDB | 5.3313642e-05 |
| 7,422 | Meaningful Labeling of Integrated Query Interfaces | 2006 | VLDB | 4.7343948e-05 |
| 8,684 | Unbiased Estimation of Size and Other Aggregates Over Hidden Web Databases | 2010 | SIGMOD | 4.4677591e-05 |
| 9,549 | Attribute Domain Discovery for Hidden Web Databases | 2011 | SIGMOD | 4.3258142e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 771 | Distributed Hypertext Resource Discovery Through Examples | 1999 | VLDB | 0.00016887664 |
| 8,129 | Discovering the Skyline of Web Databases | 2016 | VLDB | 4.5784968e-05 |
| 12,301 | Privacy Preservation of Aggregates in Hidden Databases: Why and How? | 2009 | SIGMOD | 4.1945683e-05 |
| 1,537 | Google's Deep-Web Crawl | 2008 | VLDB | 0.00011465704 |
| 5,140 | A Random Walk Approach to Sampling Hidden Databases | 2007 | SIGMOD | 5.668209e-05 |
| 1,492 | Distributed Search over the Hidden Web: Hierarchical Database Sampling and Selection | 2002 | VLDB | 0.00011694396 |
| 8,678 | Progressive Deep Web Crawling Through Keyword Queries For Data Enrichment | 2019 | SIGMOD | 4.4702119e-05 |
| 8,684 | Unbiased Estimation of Size and Other Aggregates Over Hidden Web Databases | 2010 | SIGMOD | 4.4677591e-05 |
| 234 | Crawling the Hidden Web | 2001 | VLDB | 0.00032018108 |
| 3,950 | Probe, Count, and Classify: Categorizing Hidden-Web Databases | 2001 | SIGMOD | 6.5953844e-05 |