Database Paper Browser

Back to papers

From Information to Knowledge: Harvesting Entities and Relationships from Web Sources

Summary: Tutorial surveying methods to automatically harvest entities, classes, relations and temporal contexts from semi-structured and natural-language Web sources (e.g., Wikipedia, DBpedia, YAGO) into high-precision, high-recall knowledge bases. Discusses extraction/integration pipelines, maintenance, evaluation, and open research challenges. (summarized by gpt-5-mini on Feb 09 2026)

Paper ID
1508
Venue
PODS
Year
2010
Pagerank
5.3903671e-05
Overall Rank
5,652 | 60.69%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 3 of 3 citing papers.

Rank Citing Paper Year Venue Pagerank
667 Incremental Knowledge Base Construction Using DeepDive 2015 VLDB 0.00018440557
6,992 An Efficient Publish/Subscribe Index for E-Commerce Databases 2014 VLDB 4.8701339e-05
12,044 Knowledge Harvesting in the Big-Data Era 2013 SIGMOD 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 24 of 24 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
107 WebTables: Exploring the Power of Tables on the Web 2008 VLDB 0.00048377684
287 Declarative Information Extraction Using Datalog with Embedded Extraction Predicates 2007 VLDB 0.00028971272
322 Record Linkage: Similarity Measures and Algorithms 2006 SIGMOD 0.00027518768
518 Data Integration for the Relational Web 2009 VLDB 0.00021158934
533 RoadRunner: Towards Automatic Data Extraction from Large Web Sites 2001 VLDB 0.00020757722
587 Extracting Structured Data from Web Pages 2003 SIGMOD 0.00019648348
721 Data Integration with Uncertainty 2007 VLDB 0.00017570539
759 To Search or to Crawl? Towards a Query Optimizer for Text-Centric Tasks 2006 SIGMOD 0.00017064615
1,095 The Lixto Data Extraction Project - Back and Forth between Theory and Practice 2004 PODS 0.00014126427
1,140 EntityRank: Searching Entities Directly and Holistically 2007 VLDB 0.00013720706
1,213 RDF-3X: a RISC-style Engine for RDF 2008 VLDB 0.0001325231
1,252 Principles of Dataspace Systems 2006 PODS 0.00013033186
1,317 Harvesting Relational Tables from Lists on the Web 2009 VLDB 0.00012625853
1,722 Building Structured Web Community Portals: A Top-Down, Compositional, and Incremental Approach 2007 VLDB 0.00010757784
1,950 Leveraging Data and Structure in Ontology Integration 2007 SIGMOD 9.9756731e-05
1,980 Snowball: A Prototype System for Extracting Relations from Large Text Collections 2001 SIGMOD 9.8785341e-05
2,066 DBLife: A Community Information Management Platform for the Database Research Community 2007 CIDR 9.6399561e-05
2,698 Visual Web Information Extraction with Lixto* 2001 VLDB 8.2753317e-05
2,771 A Relational Approach to Incrementally Extracting and Querying Structure in Unstructured Data 2007 VLDB 8.1421432e-05
3,931 Extracting and Querying a Comprehensive Web Database 2009 CIDR 6.6193836e-05
3,985 A First Tutorial on Dataspaces 2008 VLDB 6.5626153e-05
4,156 Uncertainty Management in Rule-Based Information Extraction Systems 2009 SIGMOD 6.3999205e-05
4,951 Mining Document Collections to Facilitate Accurate Approximate Entity Matching 2009 VLDB 5.8100413e-05
9,635 Optimizing Complex Extraction Programs over Evolving Text Data 2009 SIGMOD 4.3118125e-05
Previous Page 1 / 1 Next

Semantically Similar Papers