Database Paper Browser

Back to papers

Automatic Web-Scale Information Extraction

Summary: Yahoo! demo of web-scale information extraction. Given new websites with semi-structured data mapped to predefined schemas, automatically populate schema objects by extracting values at scale, demonstrating end-to-end, schema-driven extraction robust to site variability and across domains. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
4563
Venue
SIGMOD
Year
2012
Pagerank
4.5435639e-05
Overall Rank
8,307 | 42.22%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 2 of 2 citing papers.

Rank Citing Paper Year Venue Pagerank
10,126 Visual Template Inference for Data Extraction from Documents 2026 SIGMOD 4.1945683e-05
12,044 Knowledge Harvesting in the Big-Data Era 2013 SIGMOD 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 6 of 6 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers

Overall Rank Paper Year Venue Pagerank
11,240 Autonomously Computable Information Extraction 2023 VLDB 4.1945683e-05
5,652 From Information to Knowledge: Harvesting Entities and Relationships from Web Sources 2010 PODS 5.3903671e-05
1,395 Structured Querying of Web Text: A Technical Challenge 2007 CIDR 0.00012207039
2,633 Schema Extraction for Tabular Data on the Web 2013 VLDB 8.4063569e-05
2,617 Extraction and Integration of Partially Overlapping Web Sources 2013 VLDB 8.4462621e-05
7,326 Answering Web Queries Using Structured Data Sources 2009 SIGMOD 4.7612871e-05
12,590 An Automatic Data Grabber for Large Web Sites 2004 VLDB 4.1945683e-05
1,851 An Analysis of Structured Data on the Web 2012 VLDB 0.00010327871
1,221 A Web of Concepts 2009 PODS 0.00013219242
3,678 Automatic Wrappers for Large Scale Web Extraction 2011 VLDB 6.8517545e-05