Probe, Count, and Classify: Categorizing Hidden-Web Databases

Summary: Automates hidden-web database categorization with a small set of query probes; uses per-probe match counts, no page retrieval. Evaluated on 100+ real databases; achieves low overhead and high accuracy for automatic hierarchical categorization. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID: 3259
Venue: SIGMOD
Year: 2001
Pagerank: 6.5891156e-05
Overall Rank: 3,956 | 72.51%
DOI: -

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 4 of 4 citing papers.

Rank	Citing Paper	Year	Venue	Pagerank
1,486	Distributed Search over the Hidden Web: Hierarchical Database Sampling and Selection	2002	VLDB	0.00011691409
2,098	Knocking the Door to the Deep Web: Integrating Web Query Interfaces	2004	SIGMOD	9.5432874e-05
5,141	A Random Walk Approach to Sampling Hidden Databases	2007	SIGMOD	5.6627467e-05
12,643	From Focused Crawling to Expert Information: an Application Framework for Web Exploration and Portal Generation	2003	VLDB	4.1905499e-05

Outgoing Citations (Sorted by Pagerank)

Showing 2 of 2 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank	Cited Paper	Year	Venue	Pagerank
1,057	Determining Text Databases to Search in the Internet	1998	VLDB	0.00014376054
1,133	Automatic Discovery of Language Models for Text Databases	1999	SIGMOD	0.00013765547

Semantically Similar Papers

Overall Rank	Paper	Year	Venue	Pagerank
5,141	A Random Walk Approach to Sampling Hidden Databases	2007	SIGMOD	5.6627467e-05
2,428	Instance-based Schema Matching for Web Databases by Domain-specific Query Probing	2004	VLDB	8.8293516e-05
108	WebTables: Exploring the Power of Tables on the Web	2008	VLDB	0.00048345996
767	Distributed Hypertext Resource Discovery Through Examples	1999	VLDB	0.00016881135
1,538	Google's Deep-Web Crawl	2008	VLDB	0.00011455291
8,674	Progressive Deep Web Crawling Through Keyword Queries For Data Enrichment	2019	SIGMOD	4.4659264e-05
233	Crawling the Hidden Web	2001	VLDB	0.00031996412
8,680	Unbiased Estimation of Size and Other Aggregates Over Hidden Web Databases	2010	SIGMOD	4.4634799e-05
1,486	Distributed Search over the Hidden Web: Hierarchical Database Sampling and Selection	2002	VLDB	0.00011691409
9,549	Optimal Algorithms for Crawling a Hidden Database in the Web	2012	VLDB	4.3216687e-05