Extracting and Querying a Comprehensive Web Database

Summary: Omnivore builds a comprehensive web-scale entity-relationship DB by running multiple domain-independent extractors (tables, text, relations) in parallel over a crawl and merging heterogeneous outputs to overcome model-specific blind spots. Provides SQL-like and search interfaces, supports user corrections, and automatically selects output model/schema to render results without prior metadata. (summarized by gpt-5-mini on Feb 09 2026)

Paper ID: 105
Venue: CIDR
Year: 2009
Pagerank: 6.6130211e-05
Overall Rank: 3,935 | 72.66%
DOI: -

Incoming Non-self Citations Over Time

Authors

1. Michael J. Cafarella

Incoming Citations (Sorted by Pagerank)

Showing 4 of 4 citing papers.

Rank	Citing Paper	Year	Venue	Pagerank
5,662	From Information to Knowledge: Harvesting Entities and Relationships from Web Sources	2010	PODS	5.3854745e-05
9,366	IQ: The Case for Iterative Querying for Knowledge	2011	CIDR	4.3467888e-05
12,052	Knowledge Harvesting in the Big-Data Era	2013	SIGMOD	4.1905499e-05
12,252	DoCQS: A Prototype System for Supporting Data-oriented Content Query	2010	SIGMOD	4.1905499e-05

Outgoing Citations (Sorted by Pagerank)

Showing 7 of 7 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank	Cited Paper	Year	Venue	Pagerank
62	Freebase: A Collaboratively Created Graph Database For Structuring Human Knowledge	2008	SIGMOD	0.00064239035
108	WebTables: Exploring the Power of Tables on the Web	2008	VLDB	0.00048345996
188	Applying Model Management to Classical Meta Data Problems	2003	CIDR	0.00035935715
228	Reference Reconciliation in Complex Information Spaces	2005	SIGMOD	0.00032266415
1,398	Structured Querying of Web Text: A Technical Challenge	2007	CIDR	0.00012201166
1,716	Building Structured Web Community Portals: A Top-Down, Compositional, and Incremental Approach	2007	VLDB	0.00010772386
1,982	Snowball: A Prototype System for Extracting Relations from Large Text Collections	2001	SIGMOD	9.8709804e-05

Semantically Similar Papers

Overall Rank	Paper	Year	Venue	Pagerank
519	Data Integration for the Relational Web	2009	VLDB	0.00021148006
108	WebTables: Exploring the Power of Tables on the Web	2008	VLDB	0.00048345996
2,607	Extraction and Integration of Partially Overlapping Web Sources	2013	VLDB	8.4615436e-05
6,137	DIADEM: Thousands of Websites to a Single Database	2014	VLDB	5.190481e-05
11,852	Potential and Pitfalls of Domain-Specific Information Extraction at Web Scale	2016	SIGMOD	4.1905499e-05
3,721	Toward Large Scale Integration: Building a MetaQuerier over Databases on the Web	2005	CIDR	6.8139343e-05
2,324	Expressive and Flexible Access to Web-Extracted Data: A Keyword-based Structured Query Language	2010	SIGMOD	9.0289103e-05
12,266	ObjectRunner: Lightweight, Targeted Extraction and Querying of Structured Web Data	2010	VLDB	4.1905499e-05
5,783	A Hierarchical Approach to Model Web Query Interfaces for Web Source Integration	2009	VLDB	5.3262443e-05
1,398	Structured Querying of Web Text: A Technical Challenge	2007	CIDR	0.00012201166