Probe, Count, and Classify: Categorizing Hidden-Web Databases
Summary: Automates hidden-web database categorization with a small set of query probes; uses per-probe match counts, no page retrieval. Evaluated on 100+ real databases; achieves low overhead and high accuracy for automatic hierarchical categorization. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
Incoming Citations (Sorted by Pagerank)
Showing 4 of 4 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 1,492 | Distributed Search over the Hidden Web: Hierarchical Database Sampling and Selection | 2002 | VLDB | 0.00011694396 |
| 2,095 | Knocking the Door to the Deep Web: Integrating Web Query Interfaces | 2004 | SIGMOD | 9.5505068e-05 |
| 5,140 | A Random Walk Approach to Sampling Hidden Databases | 2007 | SIGMOD | 5.668209e-05 |
| 12,634 | From Focused Crawling to Expert Information: an Application Framework for Web Exploration and Portal Generation | 2003 | VLDB | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 2 of 2 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 1,033 | Determining Text Databases to Search in the Internet | 1998 | VLDB | 0.00014543835 |
| 1,131 | Automatic Discovery of Language Models for Text Databases | 1999 | SIGMOD | 0.00013777757 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 5,140 | A Random Walk Approach to Sampling Hidden Databases | 2007 | SIGMOD | 5.668209e-05 |
| 2,425 | Instance-based Schema Matching for Web Databases by Domain-specific Query Probing | 2004 | VLDB | 8.8376569e-05 |
| 107 | WebTables: Exploring the Power of Tables on the Web | 2008 | VLDB | 0.00048377684 |
| 771 | Distributed Hypertext Resource Discovery Through Examples | 1999 | VLDB | 0.00016887664 |
| 1,537 | Google's Deep-Web Crawl | 2008 | VLDB | 0.00011465704 |
| 8,678 | Progressive Deep Web Crawling Through Keyword Queries For Data Enrichment | 2019 | SIGMOD | 4.4702119e-05 |
| 234 | Crawling the Hidden Web | 2001 | VLDB | 0.00032018108 |
| 8,684 | Unbiased Estimation of Size and Other Aggregates Over Hidden Web Databases | 2010 | SIGMOD | 4.4677591e-05 |
| 1,492 | Distributed Search over the Hidden Web: Hierarchical Database Sampling and Selection | 2002 | VLDB | 0.00011694396 |
| 9,548 | Optimal Algorithms for Crawling a Hidden Database in the Web | 2012 | VLDB | 4.3258142e-05 |