Database Paper Browser

Back to papers

Selecting Subexpressions to Materialize at Datacenter Scale

Summary: BigSubs: an ILP-based method to select and materialize overlapping subexpressions across tens of thousands of shared analytics jobs. A distributed, vertex-centric algorithm solves a bipartite graph labeling formulation in SCOPE, delivering up to 40% machine-hours savings at datacenter scale. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
11785
Venue
VLDB
Year
2018
Pagerank
0.00010082599
Overall Rank
1,922 | 86.64%
DOI
10.14778/3192965.3192971

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 36 of 36 citing papers.

Rank Citing Paper Year Venue Pagerank
1,666 HELIX: Holistic Optimization for Accelerating Iterative Machine Learning 2019 VLDB 0.0001096361
3,606 EVA: A Symbolic Approach to Accelerating Exploratory Video Analytics with Materialized Views 2022 SIGMOD 6.9260354e-05
3,625 Cost Models for Big Data Query Processing: Learning, Retrofitting, and Our Findings 2020 SIGMOD 6.9055212e-05
3,901 Automated Verification of Query Equivalence Using Satisfiability Modulo Theories 2019 VLDB 6.6499845e-05
4,152 openGauss: An Autonomous Database System 2021 VLDB 6.4060406e-05
4,623 Automated Generation of Materialized Views in Oracle 2020 VLDB 6.0411909e-05
4,774 LIMA: Fine-grained Lineage Tracing and Reuse in Machine Learning Systems 2021 SIGMOD 5.9316087e-05
5,023 GenRewrite: Query Rewriting via Large Language Models 2026 SIGMOD 5.75363e-05
5,567 Optimizing Data Pipelines for Machine Learning in Feature Stores 2023 VLDB 5.4305348e-05
5,952 Eraser: Eliminating Performance Regression on Learned Query Optimizer 2024 VLDB 5.2591691e-05
6,149 Crystal: A Unified Cache Storage System for Analytical Databases 2021 VLDB 5.1847534e-05
6,261 The Cosmos Big Data Platform at Microsoft: Over a Decade of Progress and a Decade to Look Forward 2021 VLDB 5.1350714e-05
6,988 CrocodileDB: Efficient Database Execution through Intelligent Deferment 2020 CIDR 4.8718019e-05
7,128 Jigsaw: A Data Storage and Query Processing Engine for Irregular Table Partitioning 2021 SIGMOD 4.8230171e-05
7,296 Multi-Tenant Cloud Data Services: State-of-the-Art, Challenges and Opportunities 2022 SIGMOD 4.7723197e-05
7,407 Intermittent Query Processing 2019 VLDB 4.7373205e-05
7,461 Scalable Multi-Query Execution using Reinforcement Learning 2021 SIGMOD 4.723898e-05
8,131 Sibyl: Forecasting Time-Evolving Query Workloads 2024 SIGMOD 4.5784634e-05
8,197 SparkCruise: Workload Optimization in Managed Spark Clusters at Microsoft 2021 VLDB 4.5607121e-05
8,295 View Selection over Knowledge Graphs in Triple Stores 2021 VLDB 4.5435639e-05
8,442 SageDB: An Instance-Optimized Data Analytics System 2022 VLDB 4.5120602e-05
8,506 New Query Optimization Techniques in the Spark Engine of Azure Synapse 2022 VLDB 4.4957661e-05
8,758 Hyperspace: The Indexing Subsystem of Azure Synapse 2021 VLDB 4.456315e-05
8,783 GEqO: ML-Accelerated Semantic Equivalence Detection 2023 SIGMOD 4.452825e-05
8,854 Optimizing the cloud? Don't train models. Build oracles! 2024 CIDR 4.4349047e-05
9,194 Phoebe: A Learning-based Checkpoint Optimizer 2021 VLDB 4.3761777e-05
9,416 When sweet and cute isn't enough anymore: Solving scalability issues in Python Pandas with Grizzly 2020 CIDR 4.3441378e-05
9,735 SparkCruise: Handsfree Computation Reuse in Spark 2019 VLDB 4.2942813e-05
9,795 UniView: A Unified Autonomous Materialized View Management System for Various Databases 2024 VLDB 4.2818172e-05
9,819 Generating Application-Specific Data Layouts for In-memory Databases 2019 VLDB 4.2774401e-05
10,761 SIEVE: Effective Filtered Vector Search with Collection of Indexes 2025 VLDB 4.1945683e-05
10,850 Mayura: Exploiting Similarities in Motifs for Temporal Co-Mining 2025 VLDB 4.1945683e-05
10,890 Oligolithic Cross-task Optimizations across Isolated Workloads* 2024 CIDR 4.1945683e-05
11,197 QaaD (Query-as-a-Data): Scalable Execution of Massive Number of Small Queries in Spark 2023 SIGMOD 4.1945683e-05
11,341 Juggler: Autonomous Cost Optimization and Performance Prediction of Big Data Applications 2022 SIGMOD 4.1945683e-05
13,196 PikePlace: Generating Intelligence for Marketplace Datasets 2023 VLDB -
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 27 of 27 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
4 Pregel: A System for Large-Scale Graph Processing 2010 SIGMOD 0.0019005923
11 Implementing Data Cubes Efficiently 1996 SIGMOD 0.0011708144
41 NiagaraCQ: A Scalable Continuous Query System for Internet Databases 2000 SIGMOD 0.00073964959
70 Hive - A Warehousing Solution Over a Map-Reduce Framework 2009 VLDB 0.00059533166
158 Automated Selection of Materialized Views and Indexes for SQL Databases 2000 VLDB 0.00040071492
179 Efficient and Extensible Algorithms for Multi Query Optimization 2000 SIGMOD 0.00037672155
591 TelegraphCQ: Continuous Dataflow Processing 2003 SIGMOD 0.00019569071
830 Main-Memory Scan Sharing For Multi-Core CPUs 2008 VLDB 0.00016171897
851 The case against specialized graph analytics engines 2015 CIDR 0.0001594441
947 MRShare: Sharing Across Multiple Queries in MapReduce 2010 VLDB 0.00015114576
1,026 Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS 2007 VLDB 0.00014589172
1,353 Data Warehouse Configuration 1997 VLDB 0.00012410919
1,476 Efficient Exploitation of Similar Subexpressions for Query Processing 2007 SIGMOD 0.00011779092
2,205 ReStore: Reusing Results of MapReduce Jobs 2012 VLDB 9.2920002e-05
2,693 An Architecture for Recycling Intermediates in a Column-store 2009 SIGMOD 8.2883398e-05
2,709 Vertexica: Your Relational Friend for Graph Analytics! 2014 VLDB 8.2530203e-05
2,817 Recurring Job Optimization in Scope 2012 SIGMOD 8.0677653e-05
2,925 Shared Workload Optimization 2014 VLDB 7.888494e-05
3,038 Azure Data Lake Store: A Hyperscale Distributed File Service for Big Data Analytics 2017 SIGMOD 7.6717218e-05
3,462 Efficient and Provable Multi-Query Optimization 2017 PODS 7.0703696e-05
3,703 Multi-Query Optimization in MapReduce Framework 2014 VLDB 6.8289978e-05
4,174 Computation Reuse in Analytics Job Service at Microsoft 2018 SIGMOD 6.3856219e-05
6,075 Opportunistic Physical Design for Big Data Analytics 2014 SIGMOD 5.223901e-05
7,207 Kodiak: Leveraging Materialized Views For Very Low-Latency Analytics Over High-Dimensional Web-Scale Data 2016 VLDB 4.800763e-05
7,689 ROBUS: Fair Cache Allocation for Data-parallel Workloads 2017 SIGMOD 4.6765769e-05
8,251 View Selection in Semantic Web Databases 2012 VLDB 4.5497619e-05
8,826 Delta: Scalable Data Dissemination under Capacity Constraints 2014 VLDB 4.441364e-05
Previous Page 1 / 1 Next

Semantically Similar Papers