A Random Walk Approach to Sampling Hidden Databases
Summary: Random-walk over the interface query space to obtain a uniform sample of a hidden backend database accessed via form-like web interfaces. Fixed vs random attribute ordering and a probabilistic rejection technique improve sample quality; extensive experiments show accuracy and efficiency. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Arjun Dasgupta
- 2. Gautam Das
- 3. Heikki Mannila
Incoming Citations (Sorted by Pagerank)
Showing 15 of 15 citing papers.
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 7 of 7 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 28 | Accurate Estimation Of The Number Of Tuples Satisfying A Condition | 1984 | SIGMOD | 0.00080435857 |
| 234 | Crawling the Hidden Web | 2001 | VLDB | 0.00032018108 |
| 449 | Approximate Query Processing: Taming the TeraBytes! A Tutorial | 2001 | VLDB | 0.00022846068 |
| 1,131 | Automatic Discovery of Language Models for Text Databases | 1999 | SIGMOD | 0.00013777757 |
| 1,797 | Effective Use of Block-Level Sampling in Statistics Estimation | 2004 | SIGMOD | 0.00010523169 |
| 3,950 | Probe, Count, and Classify: Categorizing Hidden-Web Databases | 2001 | SIGMOD | 6.5953844e-05 |
| 4,100 | A Bi-Level Bernoulli Scheme for Database Sampling | 2004 | SIGMOD | 6.4531387e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 12,301 | Privacy Preservation of Aggregates in Hidden Databases: Why and How? | 2009 | SIGMOD | 4.1945683e-05 |
| 1,492 | Distributed Search over the Hidden Web: Hierarchical Database Sampling and Selection | 2002 | VLDB | 0.00011694396 |
| 3,950 | Probe, Count, and Classify: Categorizing Hidden-Web Databases | 2001 | SIGMOD | 6.5953844e-05 |
| 12,088 | Rank Discovery From Web Databases | 2013 | VLDB | 4.1945683e-05 |
| 46 | Simple Random Sampling from Relational Databases | 1986 | VLDB | 0.00070894702 |
| 13,543 | HDSampler: Revealing Data Behind Web Form Interfaces | 2009 | SIGMOD | - |
| 7,718 | Approximating Aggregate Queries about Web Pages via Random Walks | 2000 | VLDB | 4.6688065e-05 |
| 8,684 | Unbiased Estimation of Size and Other Aggregates Over Hidden Web Databases | 2010 | SIGMOD | 4.4677591e-05 |
| 9,548 | Optimal Algorithms for Crawling a Hidden Database in the Web | 2012 | VLDB | 4.3258142e-05 |
| 12,189 | Randomized Generalization for Aggregate Suppression Over Hidden Web Databases | 2011 | VLDB | 4.1945683e-05 |