Finding Global Icebergs over Distributed Data Sets
Summary: Find global icebergs across many nodes despite items that are globally frequent but locally rare, avoiding prohibitive raw-data shipping. Introduce sampling and CountSketch-based distributed protocols with provable accuracy; CountSketch cuts communication by an order of magnitude while maintaining high accuracy. (summarized by gpt-5-mini on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Qi (George) Zhao
- 2. Mitsunori Ogihara
- 3. Haixun Wang
- 4. Jun (Jim) Xu
Incoming Citations (Sorted by Pagerank)
Showing 5 of 5 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 5,903 | Building Wavelet Histograms on Large Data in MapReduce | 2012 | VLDB | 5.2791351e-05 |
| 8,040 | Distributed Threshold Querying of General Functions by a Difference of Monotonic Representation | 2011 | VLDB | 4.600049e-05 |
| 8,673 | CoopStore: Optimizing Precomputed Summaries for Aggregation | 2020 | VLDB | 4.4709116e-05 |
| 11,364 | MinMax Sampling: A Near-optimal Global Summary for Aggregation in the Wide Area | 2022 | SIGMOD | 4.1945683e-05 |
| 11,853 | Scalable Approximate Query Tracking over Highly Distributed Data Streams | 2016 | SIGMOD | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 10 of 10 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 166 | Approximate Frequency Counts over Data Streams | 2002 | VLDB | 0.00039361552 |
| 597 | Computing Iceberg Queries Efficiently | 1998 | VLDB | 0.00019475592 |
| 745 | Distributed Top-K Monitoring | 2003 | SIGMOD | 0.00017330487 |
| 781 | Spectral Bloom Filters | 2003 | SIGMOD | 0.00016741046 |
| 848 | Approximate Counts and Quantiles over Sliding Windows | 2004 | PODS | 0.0001597308 |
| 865 | What’s Hot and What’s Not: Tracking Most Frequent Items Dynamically | 2003 | PODS | 0.00015808172 |
| 1,003 | Adaptive Filters for Continuous Queries over Distributed Data Streams | 2003 | SIGMOD | 0.00014698435 |
| 1,136 | Chain: Operator Scheduling for Memory Minimization in Data Stream Systems | 2003 | SIGMOD | 0.00013760154 |
| 1,340 | Scalable Distributed Stream Processing | 2003 | CIDR | 0.00012489223 |
| 5,673 | Distributed Set-Expression Cardinality Estimation | 2004 | VLDB | 5.3780919e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 11,853 | Scalable Approximate Query Tracking over Highly Distributed Data Streams | 2016 | SIGMOD | 4.1945683e-05 |
| 1,955 | Efficient Computation of Iceberg Cubes with Complex Measures | 2001 | SIGMOD | 9.9629452e-05 |
| 8,040 | Distributed Threshold Querying of General Functions by a Difference of Monotonic Representation | 2011 | VLDB | 4.600049e-05 |
| 7,834 | Sketch-based Querying of Distributed Sliding-Window Data Streams | 2012 | VLDB | 4.6382551e-05 |
| 2,282 | Summarizing and Mining Inverse Distributions on Data Streams via Dynamic Inverse Sampling | 2005 | VLDB | 9.1073603e-05 |
| 5,796 | Finding Frequent Items in Probabilistic Data | 2008 | SIGMOD | 5.3240234e-05 |
| 13,817 | Communication-Efficient Distributed Mining of Association Rules | 2001 | SIGMOD | - |
| 835 | Finding Frequent Items in Data Streams | 2008 | VLDB | 0.00016109621 |
| 2,931 | Holistic Aggregates in a Networked World: Distributed Tracking of Approximate Quantiles | 2005 | SIGMOD | 7.8697258e-05 |
| 597 | Computing Iceberg Queries Efficiently | 1998 | VLDB | 0.00019475592 |