Scalable Topical Phrase Mining from Text Corpora
Summary: Introduces scalable topical phrase mining by combining a phrase-mining stage with a partition-based topic model. It outperforms unigram-only methods and costly n-gram models, delivering high-quality topical phrases with negligible cost across corpora. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
No non-self incoming citations found for this paper in this database.
Authors
- 1. Ahmed El-Kishky
- 2. Yanglei Song
- 3. Chi Wang
- 4. Clare R. Voss
- 5. Jiawei Han
Incoming Citations (Sorted by Pagerank)
Showing 4 of 4 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 7,912 | Mining Quality Phrases from Massive Text Corpora | 2015 | SIGMOD | 4.6183486e-05 |
| 9,136 | TextCube: Automated Construction and Multidimensional Exploration | 2019 | VLDB | 4.3881065e-05 |
| 11,775 | Building Structured Databases of Factual Knowledge from Massive Text Corpora | 2017 | SIGMOD | 4.1945683e-05 |
| 11,847 | Automatic Entity Recognition and Typing in Massive Text Data | 2016 | SIGMOD | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 2 of 2 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 36 | Fast Algorithms for Mining Association Rules | 1994 | VLDB | 0.00076161096 |
| 181 | Mining Frequent Patterns without Candidate Generation | 2000 | SIGMOD | 0.00036992674 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 11,844 | Potential and Pitfalls of Domain-Specific Information Extraction at Web Scale | 2016 | SIGMOD | 4.1945683e-05 |
| 5,520 | Towards the Web of Concepts: Extracting Concepts from Large Datasets | 2010 | VLDB | 5.4614656e-05 |
| 2,116 | On the Spatiotemporal Burstiness of Terms | 2012 | VLDB | 9.5180761e-05 |
| 4,474 | Measure-driven Keyword-Query Expansion | 2009 | VLDB | 6.1528736e-05 |
| 2,374 | Seeking Stable Clusters in the Blogosphere | 2007 | VLDB | 8.9452874e-05 |
| 5,379 | Scalable Ad-hoc Entity Extraction from Text Collections | 2008 | VLDB | 5.5405989e-05 |
| 13,328 | Scalable Training of Hierarchical Topic Models | 2018 | VLDB | - |
| 11,834 | Topic Exploration in Spatio-Temporal Document Collections | 2016 | SIGMOD | 4.1945683e-05 |
| 7,912 | Mining Quality Phrases from Massive Text Corpora | 2015 | SIGMOD | 4.6183486e-05 |
| 6,684 | Interesting-Phrase Mining for Ad-Hoc Text Analytics | 2010 | VLDB | 4.9629004e-05 |