Comparing Data Streams Using Hamming Norms (How to Zero In)
Summary: Proposes the Hamming norm as a streaming primitive: distinct counts in a single stream and cross-stream dissimilarity. Presents an l0-sketch based approximation for fast estimates on massive streams; validated on synthetic and real data. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Graham Cormode
- 2. Mayur Datar
- 3. Piotr Indyk
- 4. S. Muthukrishnan
Incoming Citations (Sorted by Pagerank)
Showing 12 of 12 citing papers.
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 9 of 9 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 59 | Sampling-Based Estimation of the Number of Distinct Values of an Attribute | 1995 | VLDB | 0.00064501896 |
| 308 | Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports | 2001 | VLDB | 0.00028142852 |
| 344 | Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries | 2001 | VLDB | 0.00026702512 |
| 378 | Towards Estimation Error Guarantees for Distinct Values | 2000 | PODS | 0.0002497492 |
| 475 | Mining Database Structure; Or, How to Build a Data Quality Browser | 2002 | SIGMOD | 0.00022303253 |
| 530 | Random Sampling for Histogram Construction: How much is enough? | 1998 | SIGMOD | 0.00020803682 |
| 852 | Dynamic Multidimensional Histograms | 2002 | SIGMOD | 0.00015941524 |
| 1,655 | Gigascope: High Performance Network Monitoring with an SQL Interface | 2002 | SIGMOD | 0.00010997332 |
| 3,794 | Identifying Representative Trends in Massive Time Series Data Sets Using Sketches | 2000 | VLDB | 6.7617267e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 166 | Approximate Frequency Counts over Data Streams | 2002 | VLDB | 0.00039361552 |
| 3,041 | Sketching Probabilistic Data Streams | 2007 | SIGMOD | 7.6697078e-05 |
| 1,064 | Processing Complex Aggregate Queries over Data Streams | 2002 | SIGMOD | 0.00014356481 |
| 2,282 | Summarizing and Mining Inverse Distributions on Data Streams via Dynamic Inverse Sampling | 2005 | VLDB | 9.1073603e-05 |
| 12,108 | Space-Efficient Estimation of Statistics over Sub-Sampled Streams | 2012 | PODS | 4.1945683e-05 |
| 10,358 | Robust Statistical Analysis on Streaming Data with Near-Duplicates in General Metric Spaces | 2025 | PODS | 4.1945683e-05 |
| 8,451 | Efficient framework for operating on data sketches | 2023 | VLDB | 4.5086031e-05 |
| 4,905 | Randomized Error Removal for Online Spread Estimation in Data Streaming | 2021 | VLDB | 5.8398332e-05 |
| 11,833 | Streaming Algorithms for Robust Distinct Elements | 2016 | SIGMOD | 4.1945683e-05 |
| 383 | An Optimal Algorithm for the Distinct Elements Problem | 2010 | PODS | 0.00024820873 |