Database Paper Browser

Back to papers

To Search or to Crawl? Towards a Query Optimizer for Text-Centric Tasks

Summary: Cost-based optimizer for text-centric tasks chooses between crawl and query-based plans using a formal model of time and recall. Uses random-graph theory and statistics to estimate task-specific parameters; validated with large-scale experiments on three tasks and multiple real-life data sets. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
3757
Venue
SIGMOD
Year
2006
Pagerank
0.00017064615
Overall Rank
759 | 94.73%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 16 of 16 citing papers.

Rank Citing Paper Year Venue Pagerank
287 Declarative Information Extraction Using Datalog with Embedded Extraction Predicates 2007 VLDB 0.00028971272
652 On the Provenance of Non-Answers to Queries over Extracted Data 2008 VLDB 0.00018634477
1,722 Building Structured Web Community Portals: A Top-Down, Compositional, and Incremental Approach 2007 VLDB 0.00010757784
2,012 DB&IR: Both Sides Now (Extended Abstract) 2007 SIGMOD 9.7951657e-05
2,319 Expressive and Flexible Access to Web-Extracted Data: A Keyword-based Structured Query Language 2010 SIGMOD 9.0387108e-05
2,984 Efficiently Incorporating User Feedback into Information Extraction and Integration Programs 2009 SIGMOD 7.7796344e-05
4,873 Power-Law Based Estimation of Set Similarity Join Size 2009 VLDB 5.8602304e-05
5,379 Scalable Ad-hoc Entity Extraction from Text Collections 2008 VLDB 5.5405989e-05
5,652 From Information to Knowledge: Harvesting Entities and Relationships from Web Sources 2010 PODS 5.3903671e-05
5,672 Effective Keyword-based Selection of Relational Databases 2007 SIGMOD 5.3784128e-05
7,280 I4E: Interactive Investigation of Iterative Information Extraction 2010 SIGMOD 4.778826e-05
8,148 When Speed Has a Price: Fast Information Extraction Using Approximate Algorithms 2013 VLDB 4.5754467e-05
9,635 Optimizing Complex Extraction Programs over Evolving Text Data 2009 SIGMOD 4.3118125e-05
11,240 Autonomously Computable Information Extraction 2023 VLDB 4.1945683e-05
11,888 Synthesizing Data Programs 2015 CIDR 4.1945683e-05
12,028 D-Hive: Data Bees Pollinating RDF, Text, and Time 2013 CIDR 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 5 of 5 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers