Database Paper Browser

Back to papers

Is Min-Wise Hashing Optimal for Summarizing Set Intersection?

Summary: Information-theoretic lower bounds show b-bit minwise hashing is space-optimal (variance within constant factor) for estimating intersections of two predicates, but k-permutation/min-hash schemes deteriorate as m grows. We provide new lower/upper bounds and a novel summary that nearly attains the lower bound for m≥2, asymptotically outperforming k-permutation schemes by Θ(m/log m) and subsampling by Θ(log n_max). (summarized by gpt-5-mini on Feb 09 2026)

Paper ID
1630
Venue
PODS
Year
2014
Pagerank
6.8247903e-05
Overall Rank
3,708 | 74.21%
DOI
10.1145/2594538.2594554

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 7 of 7 citing papers.

Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 4 of 4 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
383 An Optimal Algorithm for the Distinct Elements Problem 2010 PODS 0.00024820873
3,991 Beyond Simple Aggregates: Indexing for Summary Queries 2011 PODS 6.5553055e-05
5,415 Coordinated Weighted Sampling for Estimating Aggregates Over Multiple Weight Assignments 2009 VLDB 5.5196338e-05
5,782 Information Complexity: a Tutorial 2010 PODS 5.3292726e-05
Previous Page 1 / 1 Next

Semantically Similar Papers