Database Paper Browser

Back to papers

Morphing-based Compression for Data-centric ML Pipelines

Summary: BWARE extends AWARE to push workload-aware lossless matrix compression through feature transformations and data-cleaning/augmentation, exploiting transformation-derived structural redundancy. Adds lightweight morphing to convert compressed representations in-place (no decompression), enabling large end-to-end speedups (days→hours) for data-centric ML pipelines. (summarized by gpt-5-mini on Mar 13 2026)

Paper ID
14331
Venue
VLDB
Year
2026
Pagerank
4.1945683e-05
Overall Rank
10,291 | 28.41%
DOI
10.14778/3778092.3778104

Incoming Non-self Citations Over Time

No non-self incoming citations found for this paper in this database.

Authors

Incoming Citations (Sorted by Pagerank)

Showing 0 of 0 citing papers.

Rank Citing Paper Year Venue Pagerank
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 42 of 42 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
21 C-Store: A Column-oriented DBMS 2005 VLDB 0.00086087497
131 Integrating Compression and Execution in Column-Oriented Database Systems 2006 SIGMOD 0.0004370331
210 Gorilla: A Fast, Scalable, In-Memory Time Series Database 2015 VLDB 0.0003404384
254 Snorkel: Rapid Training Data Creation with Weak Supervision 2018 VLDB 0.00030540555
408 Database Cracking 2007 CIDR 0.00023953844
557 SystemML: Declarative Machine Learning on Spark 2016 VLDB 0.00020197988
834 Learning Linear Regression Models over Factorized Joins 2016 SIGMOD 0.00016135159
1,167 Learning Generalized Linear Models Over Normalized Data 2015 SIGMOD 0.00013547713
1,215 Snuba: Automating Weak Supervision to Label Training Data 2019 VLDB 0.0001323375
1,263 Data Blocks: Hybrid OLTP and OLAP on Compressed Storage using both Vectorization and Compilation 2016 SIGMOD 0.00012982857
1,277 The Data Civilizer System 2017 CIDR 0.00012879695
1,404 Responsible Data Management 2020 VLDB 0.00012174977
1,482 Automating Large-Scale Data Quality Verification 2018 VLDB 0.00011725533
1,700 Bridging the Archipelago between Row-Stores and Column-Stores for Hybrid Workloads 2016 SIGMOD 0.00010858865
1,967 Compressed Linear Algebra for Large-Scale Machine Learning 2016 VLDB 9.9131712e-05
2,064 Chimp: Efficient Lossless Floating Point Compression for Time Series Databases 2022 VLDB 9.6418929e-05
2,122 SystemDS: A Declarative Machine Learning System for the End-to-End Data Science Lifecycle 2020 CIDR 9.4989076e-05
2,363 Merging What’s Cracked, Cracking What’s Merged: Adaptive Indexing in Main-Memory Column-Stores 2011 VLDB 8.9580928e-05
3,005 Clay: Fine-Grained Adaptive Partitioning for General Database Schemas 2017 VLDB 7.7303579e-05
3,644 BtrBlocks: Efficient Columnar Compression for Data Lakes 2023 SIGMOD 6.8854928e-05
3,745 DeepSqueeze: Deep Semantic Compression for Tabular Data 2020 SIGMOD 6.7926132e-05
3,787 White-box Compression: Learning and Exploiting Compact Table Representations 2020 CIDR 6.7674374e-05
3,896 Updating a Cracked Database 2007 SIGMOD 6.6575888e-05
4,506 Stochastic Database Cracking: Towards Robust Adaptive Indexing in Main-Memory Column-Stores 2012 VLDB 6.1319277e-05
4,507 ALP: Adaptive Lossless floating-Point Compression 2023 SIGMOD 6.131017e-05
4,769 Automated Feature Engineering for Algorithmic Fairness 2021 VLDB 5.934329e-05
4,774 LIMA: Fine-grained Lineage Tracing and Reuse in Machine Learning Systems 2021 SIGMOD 5.9316087e-05
4,787 The Relational Data Borg is Learning 2020 VLDB 5.9224501e-05
4,833 MNC: Structure-Exploiting Sparsity Estimation for Matrix Expressions 2019 SIGMOD 5.8916346e-05
5,236 Online Deduplication for Databases 2017 SIGMOD 5.611324e-05
5,806 BlinkML: Efficient Maximum Likelihood Estimation with Probabilistic Guarantees 2019 SIGMOD 5.3200643e-05
6,157 Compression Aware Physical Database Design 2011 VLDB 5.1801143e-05
6,538 Tuple-oriented Compression for Large-scale Mini-batch Stochastic Gradient Descent 2019 SIGMOD 5.023239e-05
7,335 MorphStore: Analytical Query Engine with a Holistic Compression-Enabled Processing Model 2020 VLDB 4.7603723e-05
8,092 Saga: A Scalable Framework for Optimizing Data Cleaning Pipelines for Machine Learning Applications 2023 SIGMOD 4.587921e-05
8,257 Automating and Optimizing Data-Centric What-If Analyses on Native Machine Learning Pipelines 2023 SIGMOD 4.5487511e-05
8,514 UPLIFT: Parallelization Strategies for Feature Transformations in Machine Learning Workloads 2022 VLDB 4.4944285e-05
8,578 Robust and Budget-Constrained Encoding Configurations for In-Memory Database Systems 2022 VLDB 4.4923477e-05
8,657 Improving Matrix-vector Multiplication via Lossless Grammar-Compressed Matrices 2022 VLDB 4.4730648e-05
8,786 AWARE: Workload-aware, Redundancy-exploiting Linear Algebra 2023 SIGMOD 4.4521262e-05
9,595 High-Ratio Compression for Machine-Generated Data 2023 SIGMOD 4.3194469e-05
9,919 MorphStore — In-Memory Query Processing based on Morphing Compressed Intermediates LIVE 2019 SIGMOD 4.2561557e-05
Previous Page 1 / 1 Next

Semantically Similar Papers