To Search or to Crawl? Towards a Query Optimizer for Text-Centric Tasks
Summary: Cost-based optimizer for text-centric tasks chooses between crawl and query-based plans using a formal model of time and recall. Uses random-graph theory and statistics to estimate task-specific parameters; validated with large-scale experiments on three tasks and multiple real-life data sets. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
Incoming Citations (Sorted by Pagerank)
Showing 16 of 16 citing papers.
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 5 of 5 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 155 | Robust and Efficient Fuzzy Match for Online Data Cleaning | 2003 | SIGMOD | 0.00040637896 |
| 409 | Focused Crawling Using Context Graphs | 2000 | VLDB | 0.00023944056 |
| 530 | Random Sampling for Histogram Construction: How much is enough? | 1998 | SIGMOD | 0.00020803682 |
| 1,131 | Automatic Discovery of Language Models for Text Databases | 1999 | SIGMOD | 0.00013777757 |
| 1,492 | Distributed Search over the Hidden Web: Hierarchical Database Sampling and Selection | 2002 | VLDB | 0.00011694396 |
Previous
Page 1 / 1
Next