Structured Querying of Web Text: A Technical Challenge
Summary: ExDB: an extraction database combining information extraction with a SQL-like data model and query language to enable structured queries over Web text. Details core challenges—uncertain/heterogeneous extractions, indexing and query optimization at Web scale—and validates ideas on a 90M-page prototype. (summarized by gpt-5-mini on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Michael J. Cafarella
- 2. Christopher Ré
- 3. Dan Suciu
- 4. Oren Etzioni
- 5. Michele Banko
Incoming Citations (Sorted by Pagerank)
Showing 11 of 11 citing papers.
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 12 of 12 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 587 | Extracting Structured Data from Web Pages | 2003 | SIGMOD | 0.00019648348 |
| 7,326 | Answering Web Queries Using Structured Data Sources | 2009 | SIGMOD | 4.7612871e-05 |
| 13,626 | Managing Information Extraction [Tutorial Outline] | 2006 | SIGMOD | - |
| 1,851 | An Analysis of Structured Data on the Web | 2012 | VLDB | 0.00010327871 |
| 11,844 | Potential and Pitfalls of Domain-Specific Information Extraction at Web Scale | 2016 | SIGMOD | 4.1945683e-05 |
| 3,931 | Extracting and Querying a Comprehensive Web Database | 2009 | CIDR | 6.6193836e-05 |
| 5,774 | A Hierarchical Approach to Model Web Query Interfaces for Web Source Integration | 2009 | VLDB | 5.3313642e-05 |
| 2,771 | A Relational Approach to Incrementally Extracting and Querying Structure in Unstructured Data | 2007 | VLDB | 8.1421432e-05 |
| 13,720 | QXtract: A Building Block for Efficient Information Extraction from Text Databases | 2003 | SIGMOD | - |
| 2,319 | Expressive and Flexible Access to Web-Extracted Data: A Keyword-based Structured Query Language | 2010 | SIGMOD | 9.0387108e-05 |