Back to papers
Distributed Search over the Hidden Web: Hierarchical Database Sampling and Selection
Summary: Proposes metasearch over hidden-web databases via adaptive probes to produce content summaries with absolute word-frequency estimates. Introduces a hierarchical selection algorithm using these summaries and an induced taxonomy to surpass flat methods, validated on 50 real databases.
(summarized by gpt-5-nano on Feb 09 2026)
- Paper ID
- 8861
- Venue
- VLDB
- Year
- 2002
- Pagerank
- 0.00011694396
- Overall Rank
- 1,492 | 89.63%
- DOI
-
-
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 15 of 15 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
| 759 |
To Search or to Crawl? Towards a Query Optimizer for Text-Centric Tasks |
2006 |
SIGMOD |
0.00017064615 |
| 1,537 |
Google's Deep-Web Crawl |
2008 |
VLDB |
0.00011465704 |
| 1,862 |
Information Sharing Across Private Databases |
2003 |
SIGMOD |
0.00010286859 |
| 4,229 |
Harnessing the Deep Web: Present and Future |
2009 |
CIDR |
6.3399547e-05 |
| 5,672 |
Effective Keyword-based Selection of Relational Databases |
2007 |
SIGMOD |
5.3784128e-05 |
| 6,845 |
Facet Discovery for Structured Web Search: A Query-log Mining Approach |
2011 |
SIGMOD |
4.9092609e-05 |
| 7,890 |
Mining a Search Engine’s Corpus: Efficient Yet Unbiased Sampling and Aggregate Estimation |
2011 |
SIGMOD |
4.6249533e-05 |
| 8,678 |
Progressive Deep Web Crawling Through Keyword Queries For Data Enrichment |
2019 |
SIGMOD |
4.4702119e-05 |
| 8,684 |
Unbiased Estimation of Size and Other Aggregates Over Hidden Web Databases |
2010 |
SIGMOD |
4.4677591e-05 |
| 9,548 |
Optimal Algorithms for Crawling a Hidden Database in the Web |
2012 |
VLDB |
4.3258142e-05 |
| 9,549 |
Attribute Domain Discovery for Hidden Web Databases |
2011 |
SIGMOD |
4.3258142e-05 |
| 12,088 |
Rank Discovery From Web Databases |
2013 |
VLDB |
4.1945683e-05 |
| 12,301 |
Privacy Preservation of Aggregates in Hidden Databases: Why and How? |
2009 |
SIGMOD |
4.1945683e-05 |
| 12,326 |
Kosmix: High-Performance Topic Exploration using the Deep Web |
2009 |
VLDB |
4.1945683e-05 |
| 12,575 |
When one Sample is not Enough: Improving Text Database Selection Using Shrinkage |
2004 |
SIGMOD |
4.1945683e-05 |
Outgoing Citations (Sorted by Pagerank)
Showing 4 of 4 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 8,678 |
Progressive Deep Web Crawling Through Keyword Queries For Data Enrichment |
2019 |
SIGMOD |
4.4702119e-05 |
| 13,808 |
A Method of Re-ranking Web Search Results Using their Hidden Hyperlink Structure |
2002 |
VLDB |
- |
| 8,684 |
Unbiased Estimation of Size and Other Aggregates Over Hidden Web Databases |
2010 |
SIGMOD |
4.4677591e-05 |
| 1,033 |
Determining Text Databases to Search in the Internet |
1998 |
VLDB |
0.00014543835 |
| 5,672 |
Effective Keyword-based Selection of Relational Databases |
2007 |
SIGMOD |
5.3784128e-05 |
| 12,575 |
When one Sample is not Enough: Improving Text Database Selection Using Shrinkage |
2004 |
SIGMOD |
4.1945683e-05 |
| 771 |
Distributed Hypertext Resource Discovery Through Examples |
1999 |
VLDB |
0.00016887664 |
| 9,548 |
Optimal Algorithms for Crawling a Hidden Database in the Web |
2012 |
VLDB |
4.3258142e-05 |
| 8,691 |
Efficient and Effective Metasearch for Text Databases Incorporating Linkages among Documents |
2001 |
SIGMOD |
4.466355e-05 |
| 3,950 |
Probe, Count, and Classify: Categorizing Hidden-Web Databases |
2001 |
SIGMOD |
6.5953844e-05 |