Towards the Web of Concepts: Extracting Concepts from Large Datasets

Summary: Concept extraction reframed as market-basket mining over large corpora to build a Web of Concepts for search. Uses market-basket style measures of support and confidence to extract high-precision concept sequences; evaluated on AOL-scale query logs. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID: 10106
Venue: VLDB
Year: 2010
Pagerank: 5.4562198e-05
Overall Rank: 5,531 | 61.57%
DOI: -

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 3 of 3 citing papers.

Rank	Citing Paper	Year	Venue	Pagerank
7,913	Mining Quality Phrases from Massive Text Corpora	2015	SIGMOD	4.6139203e-05
11,595	GIANT: Scalable Creation of a Web-scale Ontology	2020	SIGMOD	4.1905499e-05
11,783	Building Structured Databases of Factual Knowledge from Massive Text Corpora	2017	SIGMOD	4.1905499e-05

Outgoing Citations (Sorted by Pagerank)

Showing 1 of 1 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank	Cited Paper	Year	Venue	Pagerank
1,221	A Web of Concepts	2009	PODS	0.00013213428

Semantically Similar Papers

Overall Rank	Paper	Year	Venue	Pagerank
5,382	Scalable Ad-hoc Entity Extraction from Text Collections	2008	VLDB	5.5358382e-05
7,890	Mining a Search Engine’s Corpus: Efficient Yet Unbiased Sampling and Aggregate Estimation	2011	SIGMOD	4.6205184e-05
1,140	EntityRank: Searching Entities Directly and Holistically	2007	VLDB	0.00013709412
4,671	Extracting large-scale knowledge bases from the web	1999	VLDB	6.0023843e-05
4,475	Measure-driven Keyword-Query Expansion	2009	VLDB	6.1469582e-05
4,097	Structured Annotations of Web Queries	2010	SIGMOD	6.4504937e-05
2,324	Expressive and Flexible Access to Web-Extracted Data: A Keyword-based Structured Query Language	2010	SIGMOD	9.0289103e-05
11,983	Which Concepts Are Worth Extracting?	2014	SIGMOD	4.1905499e-05
7,590	Scalable Column Concept Determination for Web Tables Using Large Knowledge Bases	2013	VLDB	4.6987033e-05
1,221	A Web of Concepts	2009	PODS	0.00013213428