Database Paper Browser

Back to papers

Progressive Deep Web Crawling Through Keyword Queries For Data Enrichment

Summary: SmartCrawl enables progressive deep-web crawling via keyword queries to enrich a local database under top-k and query-budget limits. It constructs a local data-driven query pool and iteratively issues high-benefit queries to maximize coverage, addressing ΔD and top-k effects with novel benefit estimation and optimizations. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
5711
Venue
SIGMOD
Year
2019
Pagerank
4.4702119e-05
Overall Rank
8,678 | 39.63%
DOI
10.1145/3299869.3319899

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 3 of 3 citing papers.

Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 18 of 18 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
181 Mining Frequent Patterns without Candidate Generation 2000 SIGMOD 0.00036992674
234 Crawling the Hidden Web 2001 VLDB 0.00032018108
420 InfoGather: Entity Augmentation and Attribute Discovery By Holistic Matching with Web Tables 2012 SIGMOD 0.00023719065
518 Data Integration for the Relational Web 2009 VLDB 0.00021158934
672 An Interactive Clustering-based Approach to Integrating Source Query Interfaces on the Deep Web 2004 SIGMOD 0.00018355746
818 Finding Related Tables 2012 SIGMOD 0.00016311524
1,367 Answering Table Queries on the Web using Column Keywords 2012 VLDB 0.00012349783
1,492 Distributed Search over the Hidden Web: Hierarchical Database Sampling and Selection 2002 VLDB 0.00011694396
1,537 Google's Deep-Web Crawl 2008 VLDB 0.00011465704
1,612 Detecting Data Errors: Where are we and what needs to be done? 2016 VLDB 0.00011142794
3,229 InfoGather+: Semantic Matching and Annotation of Numeric and Time-Varying Attributes in Web Tables 2013 SIGMOD 7.3393682e-05
3,724 Toward Large Scale Integration: Building a MetaQuerier over Databases on the Web 2005 CIDR 6.8173288e-05
5,140 A Random Walk Approach to Sampling Hidden Databases 2007 SIGMOD 5.668209e-05
7,890 Mining a Search Engine’s Corpus: Efficient Yet Unbiased Sampling and Aggregate Estimation 2011 SIGMOD 4.6249533e-05
8,684 Unbiased Estimation of Size and Other Aggregates Over Hidden Web Databases 2010 SIGMOD 4.4677591e-05
9,548 Optimal Algorithms for Crawling a Hidden Database in the Web 2012 VLDB 4.3258142e-05
9,549 Attribute Domain Discovery for Hidden Web Databases 2011 SIGMOD 4.3258142e-05
11,722 Deeper: A Data Enrichment System Powered by Deep Web 2018 SIGMOD 4.1945683e-05
Previous Page 1 / 1 Next

Semantically Similar Papers