Database Paper Browser

Back to papers

Filter Before You Parse: Faster Analytics on Raw Data with Sparser

Summary: Raw filtering applies predicates to the raw bytestream before parsing, dramatically reducing parsing overhead. SIMD RF cascades with a lightweight optimizer let Sparser pick the best cascade per data/format (JSON/Avro/Parquet), delivering up to 22x parser and 9x end-to-end speedups. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
11643
Venue
VLDB
Year
2018
Pagerank
8.2728509e-05
Overall Rank
2,700 | 81.22%
DOI
10.14778/3236187.3236207

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 17 of 17 citing papers.

Rank Citing Paper Year Venue Pagerank
2,122 SystemDS: A Declarative Machine Learning System for the End-to-End Data Science Lifecycle 2020 CIDR 9.4989076e-05
3,259 AS-Parser: Log Parsing Based on Adaptive Segmentation 2023 SIGMOD 7.3147783e-05
3,437 Speculative Distributed CSV Data Parsing for Big Data Analytics 2019 SIGMOD 7.0942161e-05
4,602 Accelerating Raw Data Analysis with the ACCORDA Software and Hardware Architecture 2019 VLDB 6.0567387e-05
4,704 JSON Tiles: Fast Analytics on Semi-Structured Data 2021 SIGMOD 5.9853687e-05
6,282 Cheetah: Accelerating Database Queries with Switch Pruning 2020 SIGMOD 5.128797e-05
7,360 ParPaRaw: Massively Parallel Parsing of Delimiter-Separated Raw Data 2020 VLDB 4.7525925e-05
7,427 Selection Pushdown in Column Stores using Bit Manipulation Instructions 2023 SIGMOD 4.7327406e-05
7,497 Stackless Processing of Streamed Trees 2021 PODS 4.7180617e-05
7,830 Scalable Structural Index Construction for JSON Analytics 2021 VLDB 4.6388763e-05
8,788 FishStore: Faster Ingestion with Subset Hashing 2019 SIGMOD 4.451039e-05
9,124 Dynamic Speculative Optimizations for SQL Compilation in Apache Spark 2020 VLDB 4.391961e-05
9,379 GIO: Generating Efficient Matrix and Frame Readers for Custom Data Formats by Example 2023 SIGMOD 4.3462787e-05
9,837 GpJSON: High-performance JSON Data Processing on GPUs 2025 VLDB 4.2740344e-05
10,482 Fast and Scalable Data Transfer Across Data Systems 2025 SIGMOD 4.1945683e-05
11,150 Zed: Leveraging Data Types to Process Eclectic Data 2023 CIDR 4.1945683e-05
11,189 dsJSON: A Distributed SQL JSON Processor 2023 SIGMOD 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 15 of 15 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers