Efficient Document Analytics on Compressed Data: Method, Challenges, Algorithms, Insights
Summary: Compression-based direct processing for document analytics on compressed data via Sequitur's hierarchical grammars. Guidelines and modules to enable practice; experiments show 90.8% storage savings, 77.5% memory savings, and 1.6x (sequential) to 2.2x (distributed) speedups. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Feng Zhang
- 2. Jidong Zhai
- 3. Xipeng Shen
- 4. Onur Mutlu
- 5. Wenguang Chen
Incoming Citations (Sorted by Pagerank)
Showing 12 of 12 citing papers.
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 3 of 3 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 4 | Pregel: A System for Large-Scale Graph Processing | 2010 | SIGMOD | 0.0019005923 |
| 193 | On Supporting Containment Queries in Relational Database Management Systems | 2001 | SIGMOD | 0.00035610321 |
| 459 | Processing Analytical Queries over Encrypted Data | 2013 | VLDB | 0.00022627746 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 10,161 | Enabling Efficient Direct Update on Rule-Based Compressed Graph | 2026 | SIGMOD | 4.1945683e-05 |
| 6,018 | Relative Lempel-Ziv Factorization for Efficient Storage and Retrieval of Web Collections | 2012 | VLDB | 5.2415551e-05 |
| 8,738 | Enumeration for MSO-Queries on Compressed Trees | 2024 | PODS | 4.456315e-05 |
| 1,100 | Query Optimization In Compressed Database Systems | 2001 | SIGMOD | 0.00014072277 |
| 8,496 | Dynamic Data Structures for Document Collections and Graphs | 2015 | PODS | 4.4981899e-05 |
| 693 | Efficiently Supporting Ad Hoc Queries in Large Datasets of Time Sequences | 1997 | SIGMOD | 0.00018077335 |
| 3,563 | Spanner Evaluation over SLP-Compressed Documents | 2021 | PODS | 6.9690833e-05 |
| 3,497 | A New Compression Method with Fast Searching on Large Databases | 1987 | VLDB | 7.0390264e-05 |
| 1,128 | An Efficient Indexing Technique for Full-Text Database Systems | 1992 | VLDB | 0.00013794088 |
| 7,429 | CompressDB: Enabling Efficient Compressed Data Direct Processing for Various Databases | 2022 | SIGMOD | 4.7320139e-05 |