A New Sparse Data Clustering Method Based On Frequent Items
Summary: Proposes k-FreqItems, a scalable clustering method for high-dimensional, sparse categorical data using a sparse FreqItem center and Jaccard distance for interpretable clusters. SILK, an LSH-based seeding technique, oversamples frequent co-occurrences to seed k-FreqItems, delivering faster, more effective initialization and billion-object scalability on commodity GPUs (code: https://github.com/HuangQiang/k-FreqItems). (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Qiang Huang
- 2. Pingyi Luo
- 3. Anthony K. H. Tung
Incoming Citations (Sorted by Pagerank)
Showing 1 of 1 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 10,522 | SBSC: A fast Self-tuned Bipartite proximity graph-based Spectral Clustering | 2025 | SIGMOD | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 19 of 19 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 3,555 | Fast Parallel Similarity Search in Multimedia Databases | 1997 | SIGMOD | 6.9772546e-05 |
| 12,462 | Optimization of Frequent Itemset Mining on Multiple-Core Processor | 2007 | VLDB | 4.1945683e-05 |
| 10,522 | SBSC: A fast Self-tuned Bipartite proximity graph-based Spectral Clustering | 2025 | SIGMOD | 4.1945683e-05 |
| 10,208 | Scalable Clustering Over High Dimensional Vector Streams | 2026 | SIGMOD | 4.1945683e-05 |
| 9,064 | Feasible Itemset Distributions in Data Mining: Theory and Application | 2003 | PODS | 4.4039656e-05 |
| 11,466 | Fast Density-Peaks Clustering: Multicore-based Parallelization Approach | 2021 | SIGMOD | 4.1945683e-05 |
| 1,595 | Fast Algorithms for Projected Clustering | 1999 | SIGMOD | 0.00011222442 |
| 10,034 | SieveSketch: A Fine-grained and Adaptive Sketch Framework for Accurate Frequency Estimation | 2026 | SIGMOD | 4.1945683e-05 |
| 835 | Finding Frequent Items in Data Streams | 2008 | VLDB | 0.00016109621 |
| 10,930 | Similarity Joins of Sparse Features | 2024 | SIGMOD | 4.1945683e-05 |