Interesting-Phrase Mining for Ad-Hoc Text Analytics
Summary: Introduces a phrase-centric framework for ad-hoc text analytics, prioritizing multi-word phrases that are frequent in a subset yet rare in the full corpus. Develops preprocessing, indexing, and top-k search methods for scalable discovery, validated on a large NYT corpus. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Srikanta Bedathur
- 2. Klaus Berberich
- 3. Jens Dittrich
- 4. Nikos Mamoulis
- 5. Gerhard Weikum
Incoming Citations (Sorted by Pagerank)
Showing 1 of 1 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 7,912 | Mining Quality Phrases from Massive Text Corpora | 2015 | SIGMOD | 4.6183486e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 5 of 5 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 181 | Mining Frequent Patterns without Candidate Generation | 2000 | SIGMOD | 0.00036992674 |
| 2,166 | BlogScope: A System for Online Analysis of High Volume Text Streams | 2007 | VLDB | 9.3896206e-05 |
| 3,256 | Multidimensional Content eXploration | 2008 | VLDB | 7.3158557e-05 |
| 4,693 | Multi-Structural Databases | 2005 | PODS | 5.9955924e-05 |
| 6,370 | Efficient Implementation of Large-Scale Multi-Structural Databases | 2005 | VLDB | 5.0935585e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 2,374 | Seeking Stable Clusters in the Blogosphere | 2007 | VLDB | 8.9452874e-05 |
| 11,971 | Mining Latent Entity Structures from Massive Unstructured and Interconnected Data | 2014 | SIGMOD | 4.1945683e-05 |
| 9,304 | Phrase Matching in XML | 2003 | VLDB | 4.3578291e-05 |
| 13,401 | NewsNetExplorer: Automatic Construction and Exploration of News Information Networks | 2014 | SIGMOD | - |
| 7,890 | Mining a Search Engine’s Corpus: Efficient Yet Unbiased Sampling and Aggregate Estimation | 2011 | SIGMOD | 4.6249533e-05 |
| 11,844 | Potential and Pitfalls of Domain-Specific Information Extraction at Web Scale | 2016 | SIGMOD | 4.1945683e-05 |
| 5,379 | Scalable Ad-hoc Entity Extraction from Text Collections | 2008 | VLDB | 5.5405989e-05 |
| 11,775 | Building Structured Databases of Factual Knowledge from Massive Text Corpora | 2017 | SIGMOD | 4.1945683e-05 |
| 7,912 | Mining Quality Phrases from Massive Text Corpora | 2015 | SIGMOD | 4.6183486e-05 |
| 11,954 | Scalable Topical Phrase Mining from Text Corpora | 2015 | VLDB | 4.1945683e-05 |