Rumble: Data Independence for Large Messy Data Sets

Summary: Rumble delivers data independence for large, nested JSON on Spark by compiling JSONiq into an iterator tree that switches between local and distributed execution. Bridging JSON nesting with Spark primitives, it overcomes impedance mismatch, scales to terabytes, and demonstrates Codd-like independence for heterogeneous data. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID: 12570
Venue: VLDB
Year: 2021
Pagerank: 4.5410785e-05
Overall Rank: 8,267 | 42.55%
DOI: 10.14778/3436905.3436910

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 3 of 3 citing papers.

Rank	Citing Paper	Year	Venue	Pagerank
9,355	GIO: Generating Efficient Matrix and Frame Readers for Custom Data Formats by Example	2023	SIGMOD	4.3484264e-05
9,701	Evaluating Query Languages and Systems for High-Energy Physics Data	2022	VLDB	4.2967256e-05
11,517	TraNCE: Transforming Nested Collections Efficiently	2021	VLDB	4.1905499e-05

Outgoing Citations (Sorted by Pagerank)

Showing 5 of 5 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank	Cited Paper	Year	Venue	Pagerank
1,265	Jaql: A Scripting Language for Large Scale Semistructured Data Analysis	2011	VLDB	0.00012933537
1,346	NoDB: Efficient Query Execution on Raw Data Files	2012	SIGMOD	0.00012472598
1,437	AsterixDB: A Scalable, Open Source BDMS	2014	VLDB	0.00011973401
2,286	SMOKE: Fine-grained Lineage at Interactive Speed	2018	VLDB	9.102574e-05
7,796	Large-scale Complex Analytics on Semi-structured Datasets using AsterixDB and Spark	2016	VLDB	4.6438468e-05

Semantically Similar Papers

Overall Rank	Paper	Year	Venue	Pagerank
11,250	Scalable Reasoning on Document Stores via Instance-Aware Query Rewriting	2023	VLDB	4.1905499e-05
3,335	SnappyData: A Unified Cluster for Streaming, Transactions, and Interactive Analytics	2017	CIDR	7.2023806e-05
3,207	Big Data Analytics with Datalog Queries on Spark	2016	SIGMOD	7.3847098e-05
6,683	Adaptive and Robust Query Execution for Lakehouses at Scale	2024	VLDB	4.9593505e-05
1,265	Jaql: A Scripting Language for Large Scale Semistructured Data Analysis	2011	VLDB	0.00012933537
9,122	Dynamic Speculative Optimizations for SQL Compilation in Apache Spark	2020	VLDB	4.3877539e-05
7,796	Large-scale Complex Analytics on Semi-structured Datasets using AsterixDB and Spark	2016	VLDB	4.6438468e-05
6,661	Scalable Querying of Nested Data	2021	VLDB	4.9663934e-05
11,199	QaaD (Query-as-a-Data): Scalable Execution of Massive Number of Small Queries in Spark	2023	SIGMOD	4.1905499e-05
11,191	dsJSON: A Distributed SQL JSON Processor	2023	SIGMOD	4.1905499e-05