Database Paper Browser

Back to papers

UPLIFT: Parallelization Strategies for Feature Transformations in Machine Learning Workloads

Summary: UPLIFT introduces data-characteristics-aware parallelization for feature transformations via a fine-grained task graph and cache-conscious execution. FTBench shows up to 31.6x speedups over state-of-the-art systems and broad applicability. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
12776
Venue
VLDB
Year
2022
Pagerank
4.4944285e-05
Overall Rank
8,514 | 40.78%
DOI
10.14778/3551793.3551842

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 4 of 4 citing papers.

Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 45 of 45 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
35 MonetDB/X100: Hyper-Pipelining Query Execution 2005 CIDR 0.00076197749
60 Efficiently Compiling Efficient Query Plans for Modern Hardware 2011 VLDB 0.00064439773
254 Snorkel: Rapid Training Data Creation with Weak Supervision 2018 VLDB 0.00030540555
351 Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-Core CPUs 2009 VLDB 0.0002636504
404 Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited 2014 VLDB 0.00024143076
536 The LDBC Social Network Benchmark: Interactive Workload 2015 SIGMOD 0.00020722862
557 SystemML: Declarative Machine Learning on Spark 2016 VLDB 0.00020197988
585 Massively Parallel Sort-Merge Joins in Main Memory Multi-Core Database Systems 2012 VLDB 0.00019706145
667 Incremental Knowledge Base Construction Using DeepDive 2015 VLDB 0.00018440557
683 Cerebro: A Data System for Optimized Deep Learning Model Selection 2020 VLDB 0.00018195476
727 On Synopses for Distinct-Value Estimation Under Multiset Operations 2007 SIGMOD 0.00017508726
761 Materialization Optimizations for Feature Selection Workloads 2014 SIGMOD 0.00017053783
1,215 Snuba: Automating Weak Supervision to Label Training Data 2019 VLDB 0.0001323375
1,402 Hybrid Parallelization Strategies for Large-Scale Machine Learning in SystemML 2014 VLDB 0.00012180605
1,420 Data Management Challenges in Production Machine Learning 2017 SIGMOD 0.00012057956
1,427 Towards Scalable Dataframe Systems 2020 VLDB 0.0001204248
1,482 Automating Large-Scale Data Quality Verification 2018 VLDB 0.00011725533
1,666 HELIX: Holistic Optimization for Accelerating Iterative Machine Learning 2019 VLDB 0.0001096361
1,727 BigBench: Towards an Industry Standard Benchmark for Big Data Analytics 2013 SIGMOD 0.00010740936
1,804 An Experimental Comparison of Thirteen Relational Equi-Joins in Main Memory 2016 SIGMOD 0.00010501185
1,967 Compressed Linear Algebra for Large-Scale Machine Learning 2016 VLDB 9.9131712e-05
2,122 SystemDS: A Declarative Machine Learning System for the End-to-End Data Science Lifecycle 2020 CIDR 9.4989076e-05
2,170 tf.data: A Machine Learning Data Processing Framework 2021 VLDB 9.3821603e-05
2,249 Orca: A Modular Query Optimizer Architecture for Big Data 2014 SIGMOD 9.2034693e-05
2,456 Production Machine Learning Pipelines: Empirical Analysis and Optimization Opportunities 2021 SIGMOD 8.7733773e-05
2,623 GenBase: A Complex Analytics Genomics Benchmark 2014 SIGMOD 8.4374366e-05
2,848 Exploiting Matrix Dependency for Efficient Distributed Matrix Computation 2015 SIGMOD 8.0208832e-05
3,721 To Partition, or Not to Partition, That is the Join Question in a Real System 2021 SIGMOD 6.8179379e-05
3,763 Flexible Rule-Based Decomposition and Metadata Independence in Modin: A Parallel Dataframe System 2022 VLDB 6.7801795e-05
3,918 On Optimizing Operator Fusion Plans for Large-Scale Machine Learning in SystemML 2018 VLDB 6.6315176e-05
3,948 A Comparative Evaluation of Systems for Scalable Linear Algebra-based Analytics 2018 VLDB 6.5959084e-05
4,196 Overton: A Data System for Monitoring and Improving Machine-Learned Products 2020 CIDR 6.3686231e-05
4,261 Parallelizing Query Optimization 2008 VLDB 6.31244e-05
4,505 SPOOF: Sum-Product Optimization and Operator Fusion for Large-Scale Machine Learning 2017 CIDR 6.1327108e-05
4,769 Automated Feature Engineering for Algorithmic Fairness 2021 VLDB 5.934329e-05
4,774 LIMA: Fine-grained Lineage Tracing and Reuse in Machine Learning Systems 2021 SIGMOD 5.9316087e-05
4,833 MNC: Structure-Exploiting Sparsity Estimation for Matrix Expressions 2019 SIGMOD 5.8916346e-05
5,087 Accelerating Queries with Group-By and Join by Groupjoin 2011 VLDB 5.7075009e-05
5,242 Towards Benchmarking Feature Type Inference for AutoML Platforms 2021 SIGMOD 5.6074743e-05
6,053 Optimizing Machine Learning Workloads in Collaborative Environments 2020 SIGMOD 5.2326838e-05
6,228 Managing ML Pipelines: Feature Stores and the Coming Wave of Embedding Ecosystems 2021 VLDB 5.1470042e-05
7,470 The Case for Deep Query Optimisation 2020 CIDR 4.7201897e-05
7,704 ExDRa: Exploratory Data Science on Federated Raw Data 2021 SIGMOD 4.6733838e-05
7,723 Mind the Gap: Bridging Multi-Domain Query Workloads with EmptyHeaded 2017 VLDB 4.6676712e-05
9,001 The Power of Nested Parallelism in Big Data Processing – Hitting Three Flies with One Slap – 2021 SIGMOD 4.4107627e-05
Previous Page 1 / 1 Next

Semantically Similar Papers