Mining Quality Phrases from Massive Text Corpora
Summary: Proposes a scalable framework for mining quality phrases from massive text corpora by integrating phrasal segmentation with limited supervision. Demonstrates near-human phrase quality and linear time/space scalability, validated on large corpora. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Jialu Liu
- 2. Jingbo Shang
- 3. Chi Wang
- 4. Xiang Ren
- 5. Jiawei Han
Incoming Citations (Sorted by Pagerank)
Showing 5 of 5 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 9,136 | TextCube: Automated Construction and Multidimensional Exploration | 2019 | VLDB | 4.3881065e-05 |
| 10,983 | A Universal Sketch for Estimating Heavy Hitters and Per-Element Frequency Moments in Data Streams with Bounded Deletions | 2024 | SIGMOD | 4.1945683e-05 |
| 11,591 | GIANT: Scalable Creation of a Web-scale Ontology | 2020 | SIGMOD | 4.1945683e-05 |
| 11,775 | Building Structured Databases of Factual Knowledge from Massive Text Corpora | 2017 | SIGMOD | 4.1945683e-05 |
| 11,847 | Automatic Entity Recognition and Typing in Massive Text Data | 2016 | SIGMOD | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 4 of 4 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 3,256 | Multidimensional Content eXploration | 2008 | VLDB | 7.3158557e-05 |
| 5,520 | Towards the Web of Concepts: Extracting Concepts from Large Datasets | 2010 | VLDB | 5.4614656e-05 |
| 6,684 | Interesting-Phrase Mining for Ad-Hoc Text Analytics | 2010 | VLDB | 4.9629004e-05 |
| 11,954 | Scalable Topical Phrase Mining from Text Corpora | 2015 | VLDB | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 10,973 | Unstructured Data Fusion for Schema and Data Extraction | 2024 | SIGMOD | 4.1945683e-05 |
| 11,755 | Scalable Semantic Querying of Text | 2018 | VLDB | 4.1945683e-05 |
| 2,319 | Expressive and Flexible Access to Web-Extracted Data: A Keyword-based Structured Query Language | 2010 | SIGMOD | 9.0387108e-05 |
| 11,844 | Potential and Pitfalls of Domain-Specific Information Extraction at Web Scale | 2016 | SIGMOD | 4.1945683e-05 |
| 7,474 | Cardinality Estimation of Approximate Substring Queries using Deep Learning | 2022 | VLDB | 4.7194345e-05 |
| 11,971 | Mining Latent Entity Structures from Massive Unstructured and Interconnected Data | 2014 | SIGMOD | 4.1945683e-05 |
| 6,729 | Keyword Query Cleaning | 2008 | VLDB | 4.9483065e-05 |
| 11,775 | Building Structured Databases of Factual Knowledge from Massive Text Corpora | 2017 | SIGMOD | 4.1945683e-05 |
| 11,954 | Scalable Topical Phrase Mining from Text Corpora | 2015 | VLDB | 4.1945683e-05 |
| 6,684 | Interesting-Phrase Mining for Ad-Hoc Text Analytics | 2010 | VLDB | 4.9629004e-05 |