Approximately Detecting Duplicates for Streaming Data using Stable Bloom Filters
Summary: Proposes Stable Bloom Filter (SBF), a fixed-space bitmap-sketch for approximate duplicate detection in streams by evicting stale entries. Analytical FPR bound is tight; experiments show SBF outperforms baselines in accuracy and speed at small space. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Fan Deng
- 2. Davood Rafiei
Incoming Citations (Sorted by Pagerank)
Showing 7 of 7 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 2,471 | Morton Filters: Faster, Space-Efficient Cuckoo Filters via Biasing, Compression, and Decoupled Logical Sparsity | 2018 | VLDB | 8.7320072e-05 |
| 4,446 | Stable Learned Bloom Filters for Data Streams | 2020 | VLDB | 6.1800659e-05 |
| 4,994 | Stacked Filters: Learning to Filter by Structure | 2021 | VLDB | 5.78027e-05 |
| 8,015 | Streaming Quotient Filter: A Near Optimal Approximate Duplicate Detection Approach for Data Streams | 2013 | VLDB | 4.6051162e-05 |
| 8,634 | Building Fast and Compact Sketches for Approximately Multi-Set Multi-Membership Querying | 2021 | SIGMOD | 4.4801584e-05 |
| 11,222 | A Learned Cuckoo Filter for Approximate Membership Queries over Variable-sized Sliding Windows on Data Streams | 2023 | SIGMOD | 4.1945683e-05 |
| 11,951 | Tracking the Conductance of Rapidly Evolving Topic-Subgraphs | 2015 | VLDB | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 12 of 12 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 41 | NiagaraCQ: A Scalable Continuous Query System for Internet Databases | 2000 | SIGMOD | 0.00073964959 |
| 43 | Models and Issues in Data Stream Systems | 2002 | PODS | 0.00072723062 |
| 142 | TelegraphCQ: Continuous Dataflow Processing for an Uncertain World | 2003 | CIDR | 0.00041725802 |
| 166 | Approximate Frequency Counts over Data Streams | 2002 | VLDB | 0.00039361552 |
| 205 | Monitoring Streams – A New Class of Data Management Applications | 2002 | VLDB | 0.00034731577 |
| 280 | Eliminating Fuzzy Duplicates in Data Warehouses | 2002 | VLDB | 0.00029113044 |
| 323 | Gigascope: A Stream Database for Network Applications | 2003 | SIGMOD | 0.00027492196 |
| 726 | Load Shedding in a Data Stream Manager | 2003 | VLDB | 0.00017511209 |
| 781 | Spectral Bloom Filters | 2003 | SIGMOD | 0.00016741046 |
| 2,090 | Maintaining Time-Decaying Stream Aggregates | 2003 | PODS | 9.5647927e-05 |
| 2,589 | DogmatiX Tracks down Duplicates in XML | 2005 | SIGMOD | 8.4847146e-05 |
| 3,050 | Comparing Data Streams Using Hamming Norms (How to Zero In) | 2002 | VLDB | 7.6512619e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 6,831 | Prefix Filter: Practically and Theoretically Better Than Bloom | 2022 | VLDB | 4.9130458e-05 |
| 4,879 | Approximately Counting Triangles in Large Graph Streams Including Edge Duplicates with a Fixed Memory Usage | 2018 | VLDB | 5.8575676e-05 |
| 835 | Finding Frequent Items in Data Streams | 2008 | VLDB | 0.00016109621 |
| 8,178 | A Shifting Bloom Filter Framework for Set Queries | 2016 | VLDB | 4.5672537e-05 |
| 11,833 | Streaming Algorithms for Robust Distinct Elements | 2016 | SIGMOD | 4.1945683e-05 |
| 4,994 | Stacked Filters: Learning to Filter by Structure | 2021 | VLDB | 5.78027e-05 |
| 5,332 | Persistent Bloom Filter: Membership Testing for the Entire History | 2018 | SIGMOD | 5.5662513e-05 |
| 781 | Spectral Bloom Filters | 2003 | SIGMOD | 0.00016741046 |
| 4,446 | Stable Learned Bloom Filters for Data Streams | 2020 | VLDB | 6.1800659e-05 |
| 8,015 | Streaming Quotient Filter: A Near Optimal Approximate Duplicate Detection Approach for Data Streams | 2013 | VLDB | 4.6051162e-05 |