Database Paper Browser

Back to papers

Materialization Optimizations for Feature Selection Workloads

Summary: Feature-selection language and an R-integrated prototype for interactive analytics. Examines materialization optimizations beyond SQL—QR-based decompositions and warmstarts—with a simple cost-based optimizer that yields near-optimal plans and speedups. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
4808
Venue
SIGMOD
Year
2014
Pagerank
0.00017053783
Overall Rank
761 | 94.71%
DOI
10.1145/2588555.2593678

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 46 of 46 citing papers.

Rank Citing Paper Year Venue Pagerank
557 SystemML: Declarative Machine Learning on Spark 2016 VLDB 0.00020197988
903 To Join or Not to Join? Thinking Twice about Joins before Feature Selection 2016 SIGMOD 0.0001547016
1,167 Learning Generalized Linear Models Over Normalized Data 2015 SIGMOD 0.00013547713
1,532 Data Management in Machine Learning: Challenges, Techniques, and Systems 2017 SIGMOD 0.00011472681
1,666 HELIX: Holistic Optimization for Accelerating Iterative Machine Learning 2019 VLDB 0.0001096361
1,891 Towards Model-based Pricing for Machine Learning in a Data Marketplace 2019 SIGMOD 0.00010194092
1,940 SliceLine: Fast, Linear-Algebra-based Slice Finding for ML Model Debugging 2021 SIGMOD 0.00010020173
1,967 Compressed Linear Algebra for Large-Scale Machine Learning 2016 VLDB 9.9131712e-05
2,122 SystemDS: A Declarative Machine Learning System for the End-to-End Data Science Lifecycle 2020 CIDR 9.4989076e-05
2,251 Vizdom: Interactive Analytics through Pen and Touch 2015 VLDB 9.1986441e-05
2,255 LINVIEW: Incremental View Maintenance for Complex Analytical Queries 2014 SIGMOD 9.1884983e-05
2,886 VISTA: Optimized System for Declarative Feature Transfer from Deep CNNs at Scale 2020 SIGMOD 7.9612767e-05
3,023 Helix: Accelerating Human-in-the-loop Machine Learning 2018 VLDB 7.6929986e-05
3,319 Sketching Linear Classifiers over Data Streams 2018 SIGMOD 7.226439e-05
3,473 AI Meets Database: AI4DB and DB4AI 2021 SIGMOD 7.062864e-05
4,395 Scalable Asynchronous Gradient Descent Optimization for Out-of-Core Models 2017 VLDB 6.2244283e-05
4,576 The Missing Piece in Complex Analytics: Low Latency, Scalable Model Management and Serving with Velox 2015 CIDR 6.0721464e-05
4,584 Scalable Kernel Density Classification via Threshold-Based Pruning 2017 SIGMOD 6.0668364e-05
4,774 LIMA: Fine-grained Lineage Tracing and Reuse in Machine Learning Systems 2021 SIGMOD 5.9316087e-05
4,785 Demonstration of Santoku: Optimizing Machine Learning over Normalized Data 2015 VLDB 5.9236989e-05
4,802 Resource Elasticity for Large-Scale Machine Learning 2015 SIGMOD 5.9114415e-05
5,257 Probabilistic Demand Forecasting at Scale 2017 VLDB 5.6003925e-05
5,567 Optimizing Data Pipelines for Machine Learning in Feature Stores 2023 VLDB 5.4305348e-05
5,806 BlinkML: Efficient Maximum Likelihood Estimation with Probabilistic Guarantees 2019 SIGMOD 5.3200643e-05
6,053 Optimizing Machine Learning Workloads in Collaborative Environments 2020 SIGMOD 5.2326838e-05
6,330 Efficient Construction of Approximate Ad-Hoc ML models Through Materialization and Reuse 2018 VLDB 5.1077416e-05
6,347 A Relational Framework for Classifier Engineering 2017 PODS 5.1019568e-05
6,469 Materialization and Reuse Optimizations for Production Data Science Pipelines 2022 SIGMOD 5.0519488e-05
6,549 Demonstration of Nimbus: Model-based Pricing for Machine Learning in a Data Marketplace 2019 SIGMOD 5.0175568e-05
6,733 Hindsight Logging for Model Training 2021 VLDB 4.9467666e-05
6,986 A Cost-based Optimizer for Gradient Descent Optimization 2017 SIGMOD 4.8727048e-05
7,407 Intermittent Query Processing 2019 VLDB 4.7373205e-05
7,602 Causal Feature Selection for Algorithmic Fairness 2022 SIGMOD 4.6988081e-05
7,656 Nautilus: An Optimized System for Deep Transfer Learning over Evolving Training Datasets 2022 SIGMOD 4.6871575e-05
7,704 ExDRa: Exploratory Data Science on Federated Raw Data 2021 SIGMOD 4.6733838e-05
8,514 UPLIFT: Parallelization Strategies for Feature Transformations in Machine Learning Workloads 2022 VLDB 4.4944285e-05
8,864 Cerebro: A Layered Data Platform for Scalable Deep Learning 2021 CIDR 4.4326439e-05
8,921 Leveraging Similarity Joins for Signal Reconstruction 2018 VLDB 4.427232e-05
9,223 Intermittent Human-in-the-Loop Model Selection using Cerebro: A Demonstration 2021 VLDB 4.3698672e-05
9,437 BlockJoin: Efficient Matrix Partitioning Through Joins 2017 VLDB 4.3425552e-05
9,912 ElasticNotebook: Enabling Live Migration for Computational Notebooks 2024 VLDB 4.2565279e-05
10,286 QStore: Quantization-Aware Compressed Model Storage 2026 VLDB 4.1945683e-05
11,476 Enforcing Constraints for Machine Learning Systems via Declarative Feature Selection: An Experimental Study 2021 SIGMOD 4.1945683e-05
11,594 TRACER: A Framework for Facilitating Accurate and Interpretable Analytics for High Stakes Applications 2020 SIGMOD 4.1945683e-05
11,639 Regularizing Conjunctive Features for Classification 2019 PODS 4.1945683e-05
11,796 A Declarative Query Processing System for Nowcasting 2017 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 4 of 4 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
140 The MADlib Analytics Library or MAD Skills, the SQL 2012 VLDB 0.00042270404
168 MAD Skills: New Analysis Practices for Big Data 2009 VLDB 0.00038946305
318 Overview of SciDB: Large Scale Array Storage, Processing and Analysis 2010 SIGMOD 0.00027795661
543 MLbase: A Distributed Machine-learning System 2013 CIDR 0.00020526854
Previous Page 1 / 1 Next

Semantically Similar Papers