Database Paper Browser

Back to papers

A New Sparse Data Clustering Method Based On Frequent Items

Summary: Proposes k-FreqItems, a scalable clustering method for high-dimensional, sparse categorical data using a sparse FreqItem center and Jaccard distance for interpretable clusters. SILK, an LSH-based seeding technique, oversamples frequent co-occurrences to seed k-FreqItems, delivering faster, more effective initialization and billion-object scalability on commodity GPUs (code: https://github.com/HuangQiang/k-FreqItems). (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
6508
Venue
SIGMOD
Year
2023
Pagerank
5.2415551e-05
Overall Rank
5,996 | 58.29%
DOI
10.1145/3588685

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 1 of 1 citing papers.

Rank Citing Paper Year Venue Pagerank
10,522 SBSC: A fast Self-tuned Bipartite proximity graph-based Spectral Clustering 2025 SIGMOD 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 19 of 19 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
33 BIRCH: An Efficient Data Clustering Method for Very Large Databases 1996 SIGMOD 0.00077324389
36 Fast Algorithms for Mining Association Rules 1994 VLDB 0.00076161096
270 OPTICS: Ordering Points To Identify the Clustering Structure 1999 SIGMOD 0.00029505642
341 CURE: An Efficient Clustering Algorithm for Large Databases 1998 SIGMOD 0.00026810548
400 Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search 2007 VLDB 0.0002427237
562 Query-Aware Locality-Sensitive Hashing for Approximate Nearest Neighbor Search 2016 VLDB 0.00020091752
605 Locality-Sensitive Hashing Scheme Based on Dynamic Collision Counting 2012 SIGMOD 0.000193396
682 Quality and Efficiency in High Dimensional Nearest Neighbor Search 2009 SIGMOD 0.00018201541
867 SRS: Solving c-Approximate Nearest Neighbor Queries in High Dimensional Euclidean Space with a Tiny Index 2015 VLDB 0.00015792021
961 DBSCAN Revisited: Mis-Claim, Un-Fixability, and Approximation 2015 SIGMOD 0.00015001792
1,757 VHP: Approximate Nearest Neighbor Search via Virtual Hypersphere Partitioning 2020 VLDB 0.00010660932
1,971 LazyLSH: Approximate Nearest Neighbor Search for Multiple Distance Functions with a Single Index 2016 SIGMOD 9.893198e-05
2,093 Scalable K-Means++ 2012 VLDB 9.5588104e-05
2,181 PM-LSH: A Fast and Accurate LSH Framework for High-Dimensional Approximate NN Search 2020 VLDB 9.3451821e-05
2,635 NG-DBSCAN: Scalable Density-Based Clustering for Arbitrary Data 2017 VLDB 8.4045788e-05
4,243 Locality-Sensitive Hashing Scheme based on Longest Circular Co-Substring 2020 SIGMOD 6.32976e-05
5,456 Point-to-Hyperplane Nearest Neighbor Search Beyond the Unit Hypersphere 2021 SIGMOD 5.4976692e-05
5,707 FARGO: Fast Maximum Inner Product Search via Global Multi-Probing 2023 VLDB 5.3611041e-05
9,303 MQH: Locality Sensitive Hashing on Multi-level Quantization Errors for Point-to-Hyperplane Distances 2023 VLDB 4.358026e-05
Previous Page 1 / 1 Next

Semantically Similar Papers