Progressive Deep Web Crawling Through Keyword Queries For Data Enrichment
Summary: SmartCrawl enables progressive deep-web crawling via keyword queries to enrich a local database under top-k and query-budget limits. It constructs a local data-driven query pool and iteratively issues high-benefit queries to maximize coverage, addressing ΔD and top-k effects with novel benefit estimation and optimizations. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Pei Wang
- 2. Ryan Shea
- 3. Jiannan Wang
- 4. Eugene Wu
Incoming Citations (Sorted by Pagerank)
Showing 3 of 3 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 8,116 | LakeBench: A Benchmark for Discovering Joinable and Unionable Tables in Data Lakes | 2024 | VLDB | 4.581507e-05 |
| 8,696 | Effective Entity Augmentation By Querying External Data Sources | 2023 | VLDB | 4.4660032e-05 |
| 9,273 | ActiveDeeper: A Model-based Active Data Enrichment System | 2020 | VLDB | 4.3649603e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 18 of 18 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 409 | Focused Crawling Using Context Graphs | 2000 | VLDB | 0.00023944056 |
| 12,231 | Optimizing Content Freshness of Relations Extracted From the Web Using Keyword Search | 2010 | SIGMOD | 4.1945683e-05 |
| 1,492 | Distributed Search over the Hidden Web: Hierarchical Database Sampling and Selection | 2002 | VLDB | 0.00011694396 |
| 9,433 | Exploration of Deep Web Repositories | 2011 | VLDB | 4.3431757e-05 |
| 234 | Crawling the Hidden Web | 2001 | VLDB | 0.00032018108 |
| 7,768 | Accurate and Efficient Crawling for Relevant Websites | 2004 | VLDB | 4.6563056e-05 |
| 3,950 | Probe, Count, and Classify: Categorizing Hidden-Web Databases | 2001 | SIGMOD | 6.5953844e-05 |
| 1,537 | Google's Deep-Web Crawl | 2008 | VLDB | 0.00011465704 |
| 11,722 | Deeper: A Data Enrichment System Powered by Deep Web | 2018 | SIGMOD | 4.1945683e-05 |
| 9,548 | Optimal Algorithms for Crawling a Hidden Database in the Web | 2012 | VLDB | 4.3258142e-05 |