Back to papers
From Information to Knowledge: Harvesting Entities and Relationships from Web Sources
Summary: Tutorial surveying methods to automatically harvest entities, classes, relations and temporal contexts from semi-structured and natural-language Web sources (e.g., Wikipedia, DBpedia, YAGO) into high-precision, high-recall knowledge bases. Discusses extraction/integration pipelines, maintenance, evaluation, and open research challenges.
(summarized by gpt-5-mini on Feb 09 2026)
- Paper ID
- 1508
- Venue
- PODS
- Year
- 2010
- Pagerank
- 5.3903671e-05
- Overall Rank
- 5,652 | 60.69%
- DOI
-
-
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 3 of 3 citing papers.
Outgoing Citations (Sorted by Pagerank)
Showing 24 of 24 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 107 |
WebTables: Exploring the Power of Tables on the Web |
2008 |
VLDB |
0.00048377684 |
| 287 |
Declarative Information Extraction
Using Datalog with Embedded Extraction Predicates |
2007 |
VLDB |
0.00028971272 |
| 322 |
Record Linkage: Similarity Measures and Algorithms |
2006 |
SIGMOD |
0.00027518768 |
| 518 |
Data Integration for the Relational Web |
2009 |
VLDB |
0.00021158934 |
| 533 |
RoadRunner: Towards Automatic Data Extraction from Large Web Sites |
2001 |
VLDB |
0.00020757722 |
| 587 |
Extracting Structured Data from Web Pages |
2003 |
SIGMOD |
0.00019648348 |
| 721 |
Data Integration with Uncertainty |
2007 |
VLDB |
0.00017570539 |
| 759 |
To Search or to Crawl? Towards a Query Optimizer for Text-Centric Tasks |
2006 |
SIGMOD |
0.00017064615 |
| 1,095 |
The Lixto Data Extraction Project - Back and Forth between Theory and Practice |
2004 |
PODS |
0.00014126427 |
| 1,140 |
EntityRank: Searching Entities Directly and Holistically |
2007 |
VLDB |
0.00013720706 |
| 1,213 |
RDF-3X: a RISC-style Engine for RDF |
2008 |
VLDB |
0.0001325231 |
| 1,252 |
Principles of Dataspace Systems |
2006 |
PODS |
0.00013033186 |
| 1,317 |
Harvesting Relational Tables from Lists on the Web |
2009 |
VLDB |
0.00012625853 |
| 1,722 |
Building Structured Web Community Portals: A Top-Down, Compositional, and Incremental Approach |
2007 |
VLDB |
0.00010757784 |
| 1,950 |
Leveraging Data and Structure in Ontology Integration |
2007 |
SIGMOD |
9.9756731e-05 |
| 1,980 |
Snowball: A Prototype System for Extracting Relations from Large Text Collections |
2001 |
SIGMOD |
9.8785341e-05 |
| 2,066 |
DBLife: A Community Information Management Platform for the Database Research Community |
2007 |
CIDR |
9.6399561e-05 |
| 2,698 |
Visual Web Information Extraction with Lixto* |
2001 |
VLDB |
8.2753317e-05 |
| 2,771 |
A Relational Approach to Incrementally Extracting and Querying Structure in Unstructured Data |
2007 |
VLDB |
8.1421432e-05 |
| 3,931 |
Extracting and Querying a Comprehensive Web Database |
2009 |
CIDR |
6.6193836e-05 |
| 3,985 |
A First Tutorial on Dataspaces |
2008 |
VLDB |
6.5626153e-05 |
| 4,156 |
Uncertainty Management in Rule-Based Information Extraction Systems |
2009 |
SIGMOD |
6.3999205e-05 |
| 4,951 |
Mining Document Collections to Facilitate Accurate Approximate Entity Matching |
2009 |
VLDB |
5.8100413e-05 |
| 9,635 |
Optimizing Complex Extraction Programs over Evolving Text Data |
2009 |
SIGMOD |
4.3118125e-05 |
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 9,423 |
Database Principles in Information Extraction |
2014 |
PODS |
4.3441378e-05 |
| 8,696 |
Effective Entity Augmentation By Querying External Data Sources |
2023 |
VLDB |
4.4660032e-05 |
| 3,229 |
InfoGather+: Semantic Matching and Annotation of Numeric and Time-Varying Attributes in Web Tables |
2013 |
SIGMOD |
7.3393682e-05 |
| 11,906 |
Knowledge Curation and Knowledge Fusion: Challenges, Models, and Applications |
2015 |
SIGMOD |
4.1945683e-05 |
| 4,630 |
Knowledge Graphs 2021: A Data Odyssey |
2021 |
VLDB |
6.0348379e-05 |
| 11,775 |
Building Structured Databases of Factual Knowledge from Massive Text Corpora |
2017 |
SIGMOD |
4.1945683e-05 |
| 420 |
InfoGather: Entity Augmentation and Attribute Discovery By Holistic Matching with Web Tables |
2012 |
SIGMOD |
0.00023719065 |
| 4,304 |
NAGA: Harvesting, Searching and Ranking Knowledge |
2008 |
SIGMOD |
6.2885419e-05 |
| 364 |
Annotating and Searching Web Tables Using Entities, Types and Relationships |
2010 |
VLDB |
0.00025637562 |
| 12,044 |
Knowledge Harvesting in the Big-Data Era |
2013 |
SIGMOD |
4.1945683e-05 |