Automatic Web-Scale Information Extraction
Summary: Yahoo! demo of web-scale information extraction. Given new websites with semi-structured data mapped to predefined schemas, automatically populate schema objects by extracting values at scale, demonstrating end-to-end, schema-driven extraction robust to site variability and across domains. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Philip Bohannon
- 2. Nilesh Dalvi
- 3. Yuval Filmus
- 4. Nori Jacoby
- 5. Sathiya Keerthi
- 6. Alok Kirpal
Incoming Citations (Sorted by Pagerank)
Showing 2 of 2 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 10,126 | Visual Template Inference for Data Extraction from Documents | 2026 | SIGMOD | 4.1945683e-05 |
| 12,044 | Knowledge Harvesting in the Big-Data Era | 2013 | SIGMOD | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 6 of 6 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 107 | WebTables: Exploring the Power of Tables on the Web | 2008 | VLDB | 0.00048377684 |
| 1,132 | Building light-weight wrappers for legacy Web data-sources using W4F | 1999 | VLDB | 0.00013777657 |
| 1,317 | Harvesting Relational Tables from Lists on the Web | 2009 | VLDB | 0.00012625853 |
| 1,585 | Answering Table Augmentation Queries from Unstructured Lists on the Web | 2009 | VLDB | 0.00011255098 |
| 1,722 | Building Structured Web Community Portals: A Top-Down, Compositional, and Incremental Approach | 2007 | VLDB | 0.00010757784 |
| 3,678 | Automatic Wrappers for Large Scale Web Extraction | 2011 | VLDB | 6.8517545e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 11,240 | Autonomously Computable Information Extraction | 2023 | VLDB | 4.1945683e-05 |
| 5,652 | From Information to Knowledge: Harvesting Entities and Relationships from Web Sources | 2010 | PODS | 5.3903671e-05 |
| 1,395 | Structured Querying of Web Text: A Technical Challenge | 2007 | CIDR | 0.00012207039 |
| 2,633 | Schema Extraction for Tabular Data on the Web | 2013 | VLDB | 8.4063569e-05 |
| 2,617 | Extraction and Integration of Partially Overlapping Web Sources | 2013 | VLDB | 8.4462621e-05 |
| 7,326 | Answering Web Queries Using Structured Data Sources | 2009 | SIGMOD | 4.7612871e-05 |
| 12,590 | An Automatic Data Grabber for Large Web Sites | 2004 | VLDB | 4.1945683e-05 |
| 1,851 | An Analysis of Structured Data on the Web | 2012 | VLDB | 0.00010327871 |
| 1,221 | A Web of Concepts | 2009 | PODS | 0.00013219242 |
| 3,678 | Automatic Wrappers for Large Scale Web Extraction | 2011 | VLDB | 6.8517545e-05 |