Database Paper Browser

Back to authors

Matei Zaharia

Author ID
1160
ORCID
-
Links
(found by gpt-5.2 on feb 8th, 2026)
Most Frequent Institution
Stanford University
Pagerank
0.39736622
Overall Rank
92 | 99.57%
Paper Count
48

Affiliation Timeline

Incoming Non-self Citations Over Time

Total yearly non-self incoming citations across all papers by this author.

Publications by Paper Pagerank

Showing 48 of 48 publications.

Rank Title Year Venue Pagerank
66 Spark SQL: Relational Data Processing in Spark 2015 SIGMOD 0.00061639801
316 NoScope: Optimizing Neural Network Queries over Video at Scale 2017 VLDB 0.00027988668
542 Shark: SQL and Rich Analytics at Scale 2013 SIGMOD 0.00020595648
696 BlazeIt: Optimizing Declarative Aggregation and Limit Queries for Neural Network-Based Video Analytics 2020 VLDB 0.00018048935
746 Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores 2020 VLDB 0.00017326979
1,377 Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics 2021 CIDR 0.00012296941
1,548 Structured Streaming: A Declarative API for Real-Time Applications in Apache Spark 2018 SIGMOD 0.00011431383
1,657 Challenges and Opportunities in DNN-Based Video Analytics: A Demonstration of the BlazeIt Video Query Engine 2019 CIDR 0.00010987105
1,750 Weld: A Common Runtime for High Performance Data Analytics 2017 CIDR 0.00010683647
2,014 Voodoo - A Vector Algebra for Portable Database Performance on Modern Hardware 2016 VLDB 9.7904029e-05
2,152 MISTIQUE: A System to Store and Query Model Intermediates for Model Diagnosis 2018 SIGMOD 9.4239787e-05
2,154 DIFF: A Relational Interface for Large-Scale Data Explanation 2019 VLDB 9.4208667e-05
2,473 Photon: A Fast Query Engine for Lakehouse Systems 2022 SIGMOD 8.7237281e-05
2,488 Shark: Fast Data Analysis Using Coarse-grained Distributed Memory 2012 SIGMOD 8.6683713e-05
2,523 ACORN: Performant and Predicate-Agnostic Search Over Vector Embeddings and Structured Data 2024 SIGMOD 8.604576e-05
2,700 Filter Before You Parse: Faster Analytics on Raw Data with Sparser 2018 VLDB 8.2728509e-05
2,896 Evaluating End-to-End Optimization for Data Analytics Applications in Weld 2018 VLDB 7.9452051e-05
3,293 Jointly Optimizing Preprocessing and Inference for DNN-based Visual Analytics 2021 VLDB 7.2629834e-05
3,331 A Demonstration of Willump: A Statistically-Aware End-to-end Optimizer for Machine Learning Inference 2020 VLDB 7.2131599e-05
3,359 Text2SQL is Not Enough: Unifying AI and Databases with TAG 2025 CIDR 7.1744146e-05
3,535 Scaling Spark in the Real World: Performance and Usability 2015 VLDB 6.9992495e-05
3,558 Approximate Selection with Guarantees using Proxies 2020 VLDB 6.9765724e-05
3,688 DBOS: A DBMS-oriented Operating System 2022 VLDB 6.8414694e-05
4,501 TASTI: Semantic Indexes for Machine Learning-based Queries over Unstructured Data 2022 SIGMOD 6.137686e-05
4,567 Optimizing Video Analytics with Declarative Model Relationships 2023 VLDB 6.080526e-05
4,641 VIVA: An End-to-End System for Interactive Video Analytics 2022 CIDR 6.027004e-05
4,712 Accelerating Approximate Aggregation Queries with Expensive Predicates 2021 VLDB 5.9787986e-05
5,318 Analyzing and Comparing Lakehouse Storage Systems 2023 CIDR 5.5715872e-05
6,134 Finding Label and Model Errors in Perception Data With Learned Observation Assertions 2022 SIGMOD 5.1943414e-05
6,784 SparkR: Scaling R Programs with Spark 2016 SIGMOD 4.9265155e-05
7,059 Adaptive and Robust Query Execution for Lakehouses at Scale 2024 VLDB 4.8477825e-05
7,631 A Progress Report on DBOS: A Database-oriented Operating System 2022 CIDR 4.6917915e-05
7,928 Accelerating Aggregation Queries on Unstructured Streams of Data 2023 VLDB 4.613455e-05
8,039 Epoxy: ACID Transactions Across Diverse Data Stores 2023 VLDB 4.6000795e-05
8,150 Parallelism-Optimizing Data Placement for Faster Data-Parallel Computations 2023 VLDB 4.5746638e-05
8,293 Challenges and Opportunities for Autonomous Vehicle Query Systems 2021 CIDR 4.5435639e-05
8,469 Semantic Operators and Their Optimization: Enabling LLM-Based Data Processing with Accuracy Guarantees in LOTUS 2025 VLDB 4.5041113e-05
8,608 Unity Catalog: Open and Universal Governance for the Lakehouse and Beyond 2025 SIGMOD 4.4853979e-05
8,663 Transactions Make Debugging Easy 2023 CIDR 4.4722808e-05
9,093 Databricks Lakeguard: Supporting Fine-grained Access Control and Multi-user Capabilities for Apache Spark Workloads 2025 SIGMOD 4.398149e-05
9,555 Bringing the Operational and Analytical Worlds Together with Lakebase 2025 VLDB 4.3254416e-05
9,584 Introduction to Spark 2.0 for Database Researchers 2016 SIGMOD 4.3218691e-05
9,992 Supporting Our AI Overlords: Redesigning Data Systems to be Agent-First 2026 CIDR 4.1945683e-05
11,255 R3: Record-Replay-Retroaction for Database-Backed Applications 2023 VLDB 4.1945683e-05
13,096 Blink Twice - Automatic Workload Pinning and Regression Detection for Versionless Apache Spark using Retries 2025 SIGMOD -
13,124 Delta Sharing: An Open Protocol for Cross-Platform Data Sharing 2025 VLDB -
13,227 Cloud Data Systems: What are the Opportunities for the Database Research Community? 2022 VLDB -
13,269 Designing Production-Friendly Machine Learning 2021 VLDB -
Previous Page 1 / 1 Next

Frequent Co-authors

Co-authored at least 5 papers.

Co-author Shared Papers Rank Pagerank
Peter D. Bailis 15 127 0.33278392
Reynold Xin 15 239 0.21746365
Daniel Kang 12 328 0.16947809
Peter Kraft 10 777 0.083247146
Utkarsh Agarwal 9 830 0.079633371
Sean Rhea 7 55 0.52599089
Michael Armbrust 7 450 0.12692701
Ali Ghodsi 6 404 0.14048913
Christos Kozyrakis 6 1,126 0.060590969
Michael Stonebraker 5 6 1.0621118
Qian Li 5 1,739 0.04154398