Finding Persistent Items in Data Streams
Summary: Proposes persistent item mining in data streams and introduces PIE, a compact scheme to detect long-term items. PIE encodes IDs with Raptor codes, storing only a few bits per observation window to enable exact recovery with very low FNR; real-trace experiments show up to 19.5x FNR reduction vs prior art. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Haipeng Dai
- 2. Muhammad Shahzad
- 3. Alex X. Liu
- 4. Yuankun Zhong
Incoming Citations (Sorted by Pagerank)
Showing 6 of 6 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 1,941 | Cold Filter: A Meta-Framework for Faster and More Accurate Stream Processing | 2018 | SIGMOD | 0.00010017745 |
| 3,751 | BurstSketch: Finding Bursts in Data Streams | 2021 | SIGMOD | 6.7888099e-05 |
| 6,790 | On-Off Sketch: A Fast and Accurate Sketch on Persistence | 2021 | VLDB | 4.9251439e-05 |
| 7,732 | Double-Anonymous Sketch: Achieving Top-K-fairness for Finding Global Top-K Frequent Items | 2023 | SIGMOD | 4.6657123e-05 |
| 8,250 | Stingy Sketch: A Sketch Framework for Accurate and Fast Frequency Estimation | 2022 | VLDB | 4.5506131e-05 |
| 10,386 | Pandora: An Efficient and Rapid Solution for Persistence-Based Tasks in High-Speed Data Streams | 2025 | SIGMOD | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 2 of 2 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 166 | Approximate Frequency Counts over Data Streams | 2002 | VLDB | 0.00039361552 |
| 835 | Finding Frequent Items in Data Streams | 2008 | VLDB | 0.00016109621 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 11,440 | Frequent Elements with Witnesses in Data Streams | 2021 | PODS | 4.1945683e-05 |
| 13,796 | Mining Frequent Itemsets with Bit Strings and Trie | 2002 | VLDB | - |
| 5,772 | Mining Frequent Patterns with Differential Privacy | 2013 | VLDB | 5.3322378e-05 |
| 166 | Approximate Frequency Counts over Data Streams | 2002 | VLDB | 0.00039361552 |
| 6,342 | A Regression-Based Temporal Pattern Mining Scheme for Data Streams | 2003 | VLDB | 5.1034654e-05 |
| 11,978 | Resource-oriented Approximation for Frequent Itemset Mining from Bursty Data Streams | 2014 | SIGMOD | 4.1945683e-05 |
| 6,790 | On-Off Sketch: A Fast and Accurate Sketch on Persistence | 2021 | VLDB | 4.9251439e-05 |
| 6,599 | Local Differentially Private Heavy Hitter Detection in Data Streams with Bounded Memory | 2024 | SIGMOD | 4.9973567e-05 |
| 4,449 | False Positive or False Negative: Mining Frequent Itemsets from High Speed Transactional Data Streams | 2004 | VLDB | 6.1780147e-05 |
| 835 | Finding Frequent Items in Data Streams | 2008 | VLDB | 0.00016109621 |