Building Structured Databases of Factual Knowledge from Massive Text Corpora

Summary: Minimally-supervised, domain- and language-independent extraction of entities, relations, and attributes to build StructDBs from text corpora. Demonstrates scalable cross-domain StructDB construction across news, social, biomedical, and business data with reduced labeling, enabling exploration and knowledge discovery. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID: 5340
Venue: SIGMOD
Year: 2017
Pagerank: 4.1905499e-05
Overall Rank: 11,783 | 18.11%
DOI: 10.1145/3035918.3054781

Incoming Non-self Citations Over Time

No non-self incoming citations found for this paper in this database.

Authors

Incoming Citations (Sorted by Pagerank)

Showing 0 of 0 citing papers.

Rank	Citing Paper	Year	Venue	Pagerank

Outgoing Citations (Sorted by Pagerank)

Showing 16 of 16 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank	Cited Paper	Year	Venue	Pagerank
62	Freebase: A Collaboratively Created Graph Database For Structuring Human Knowledge	2008	SIGMOD	0.00064239035
108	WebTables: Exploring the Power of Tables on the Web	2008	VLDB	0.00048345996
365	Annotating and Searching Web Tables Using Entities, Types and Relationships	2010	VLDB	0.00025616694
420	InfoGather: Entity Augmentation and Attribute Discovery By Holistic Matching with Web Tables	2012	SIGMOD	0.00023700634
668	Incremental Knowledge Base Construction Using DeepDive	2015	VLDB	0.00018428925
1,068	Probase: A Probabilistic Taxonomy for Text Understanding	2012	SIGMOD	0.00014316508
3,217	Natural Language Question Answering over RDF — A Graph Data Driven Approach	2014	SIGMOD	7.3672608e-05
3,293	Biperpedia: An Ontology for Search Applications	2014	VLDB	7.2598242e-05
3,826	Automatic Discovery of Attributes in Relational Databases	2011	SIGMOD	6.7204879e-05
4,105	Extracting Databases from Dark Data with DeepDive	2016	SIGMOD	6.4409563e-05
5,531	Towards the Web of Concepts: Extracting Concepts from Large Datasets	2010	VLDB	5.4562198e-05
7,615	Mining Attribute-structure Correlated Patterns in Large Attributed Graphs	2012	VLDB	4.6902598e-05
7,913	Mining Quality Phrases from Massive Text Corpora	2015	SIGMOD	4.6139203e-05
7,919	DEXTER: Large-Scale Discovery and Extraction of Product Specifications on the Web	2015	VLDB	4.6123189e-05
9,429	Database Principles in Information Extraction	2014	PODS	4.3399748e-05
11,962	Scalable Topical Phrase Mining from Text Corpora	2015	VLDB	4.1905499e-05

Semantically Similar Papers

Overall Rank	Paper	Year	Venue	Pagerank
13,640	Managing Information Extraction [Tutorial Outline]	2006	SIGMOD	-
7,913	Mining Quality Phrases from Massive Text Corpora	2015	SIGMOD	4.6139203e-05
12,052	Knowledge Harvesting in the Big-Data Era	2013	SIGMOD	4.1905499e-05
5,382	Scalable Ad-hoc Entity Extraction from Text Collections	2008	VLDB	5.5358382e-05
10,976	Unstructured Data Fusion for Schema and Data Extraction	2024	SIGMOD	4.1905499e-05
5,538	Data-Driven Domain Discovery for Structured Datasets	2020	VLDB	5.4520759e-05
11,852	Potential and Pitfalls of Domain-Specific Information Extraction at Web Scale	2016	SIGMOD	4.1905499e-05
11,979	Mining Latent Entity Structures from Massive Unstructured and Interconnected Data	2014	SIGMOD	4.1905499e-05
11,855	Automatic Entity Recognition and Typing in Massive Text Data	2016	SIGMOD	4.1905499e-05
9,136	TextCube: Automated Construction and Multidimensional Exploration	2019	VLDB	4.3843441e-05