Database Paper Browser

Back to papers

SystemDS: A Declarative Machine Learning System for the End-to-End Data Science Lifecycle

Summary: SystemDS is an open-source declarative ML system that unifies the end-to-end data science lifecycle—data integration, cleaning, preparation, local/distributed/federated training, debugging, and serving—via a stack of language abstractions. It targets lifecycle-wide optimization to eliminate boundary crossing between data engineering and modeling, building on SystemML lessons to support heterogeneous data and diverse user expertise. (summarized by gpt-5-mini on Feb 09 2026)

Paper ID
358
Venue
CIDR
Year
2020
Pagerank
9.4989076e-05
Overall Rank
2,122 | 85.24%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 29 of 29 citing papers.

Rank Citing Paper Year Venue Pagerank
1,940 SliceLine: Fast, Linear-Algebra-based Slice Finding for ML Model Debugging 2021 SIGMOD 0.00010020173
2,163 Elastic Machine Learning Algorithms in Amazon SageMaker 2020 SIGMOD 9.3949234e-05
2,839 VolcanoML: Speeding up End-to-End AutoML via Scalable Search Space Decomposition 2021 VLDB 8.0378978e-05
3,254 Query Processing on Tensor Computation Runtimes 2022 VLDB 7.3161051e-05
4,774 LIMA: Fine-grained Lineage Tracing and Reuse in Machine Learning Systems 2021 SIGMOD 5.9316087e-05
7,306 DAPHNE: An Open and Extensible System Infrastructure for Integrated Data Analysis Pipelines 2022 CIDR 4.7678574e-05
7,494 SubStrat: A Subset-Based Optimization Strategy for Faster AutoML 2023 VLDB 4.7180617e-05
7,704 ExDRa: Exploratory Data Science on Federated Raw Data 2021 SIGMOD 4.6733838e-05
8,092 Saga: A Scalable Framework for Optimizing Data Cleaning Pipelines for Machine Learning Applications 2023 SIGMOD 4.587921e-05
8,257 Automating and Optimizing Data-Centric What-If Analyses on Native Machine Learning Pipelines 2023 SIGMOD 4.5487511e-05
8,262 FuseME: Distributed Matrix Computation Engine based on Cuboid-based Fused Operator and Plan Generation 2022 SIGMOD 4.5467867e-05
8,279 Galley: Modern Query Optimization for Sparse Tensor Programs 2025 SIGMOD 4.5435639e-05
8,514 UPLIFT: Parallelization Strategies for Feature Transformations in Machine Learning Workloads 2022 VLDB 4.4944285e-05
8,620 PreVision: An Out-of-Core Matrix Computation System with Optimal Buffer Replacement 2024 SIGMOD 4.4837361e-05
8,743 CtxPipe: Context-aware Data Preparation Pipeline Construction for Machine Learning 2024 SIGMOD 4.456315e-05
9,192 Hyper-Tune: Towards Efficient Hyper-parameter Tuning at Scale 2022 VLDB 4.3765131e-05
9,222 Towards an Optimized GROUP BY Abstraction for Large-Scale Machine Learning 2021 VLDB 4.3698672e-05
9,379 GIO: Generating Efficient Matrix and Frame Readers for Custom Data Formats by Example 2023 SIGMOD 4.3462787e-05
9,694 EinDecomp: Decomposition of Declaratively-Specified Machine Learning and Numerical Computations for Parallel Execution 2025 VLDB 4.3025567e-05
10,177 InferF: Declarative Factorization of AI/ML Inferences over Joins 2026 SIGMOD 4.1945683e-05
10,226 Automated Tensor-Relational Decomposition for Large-Scale Sparse Tensor Computation 2026 VLDB 4.1945683e-05
10,291 Morphing-based Compression for Data-centric ML Pipelines 2026 VLDB 4.1945683e-05
10,560 A Systematic Study on Early Stopping Metrics in HPO and the Implications of Uncertainty 2025 VLDB 4.1945683e-05
10,571 Quantum Data Management in the NISQ Era 2025 VLDB 4.1945683e-05
10,628 CatDB: Data-catalog-guided, LLM-based Generation of Data-centric ML Pipelines 2025 VLDB 4.1945683e-05
11,288 To UDFs and Beyond: Demonstration of a Fully Decomposed Data Processor for General Data Wrangling Tasks 2023 VLDB 4.1945683e-05
11,339 Redundancy Elimination in Distributed Matrix Computation 2022 SIGMOD 4.1945683e-05
11,402 ReMac: A Matrix Computation System with Redundancy Elimination 2022 VLDB 4.1945683e-05
11,476 Enforcing Constraints for Machine Learning Systems via Declarative Feature Selection: An Experimental Study 2021 SIGMOD 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 38 of 38 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
112 Potter's Wheel: An Interactive Data Cleaning System 2001 VLDB 0.00047045036
140 The MADlib Analytics Library or MAD Skills, the SQL 2012 VLDB 0.00042270404
168 MAD Skills: New Analysis Practices for Big Data 2009 VLDB 0.00038946305
192 HoloClean: Holistic Data Repairs with Probabilistic Inference 2017 VLDB 0.00035728858
557 SystemML: Declarative Machine Learning on Spark 2016 VLDB 0.00020197988
658 Towards a Unified Architecture for in-RDBMS Analytics 2012 SIGMOD 0.00018506577
761 Materialization Optimizations for Feature Selection Workloads 2014 SIGMOD 0.00017053783
834 Learning Linear Regression Models over Factorized Joins 2016 SIGMOD 0.00016135159
921 Democratizing Data Science through Interactive Curation of ML Pipelines 2019 SIGMOD 0.00015337438
1,078 Model Management 2.0: Manipulating Richer Mappings 2007 SIGMOD 0.00014245848
1,099 Interpretable and Informative Explanations of Outcomes 2015 VLDB 0.00014096312
1,167 Learning Generalized Linear Models Over Normalized Data 2015 SIGMOD 0.00013547713
1,337 HoloDetect: Few-Shot Learning for Error Detection 2019 SIGMOD 0.00012497164
1,343 NoDB: Efficient Query Execution on Raw Data Files 2012 SIGMOD 0.00012482538
1,350 Northstar: An Interactive Data Science System 2018 VLDB 0.00012431059
1,402 Hybrid Parallelization Strategies for Large-Scale Machine Learning in SystemML 2014 VLDB 0.00012180605
1,482 Automating Large-Scale Data Quality Verification 2018 VLDB 0.00011725533
1,666 HELIX: Holistic Optimization for Accelerating Iterative Machine Learning 2019 VLDB 0.0001096361
1,750 Weld: A Common Runtime for High Performance Data Analytics 2017 CIDR 0.00010683647
1,967 Compressed Linear Algebra for Large-Scale Machine Learning 2016 VLDB 9.9131712e-05
2,097 Predictive Interaction for Data Transformation 2015 CIDR 9.5489822e-05
2,152 MISTIQUE: A System to Store and Query Model Intermediates for Model Diagnosis 2018 SIGMOD 9.4239787e-05
2,350 An Intermediate Representation for Optimizing Machine Learning Pipelines 2019 VLDB 8.9788641e-05
2,573 Query Optimization for Dynamic Imputation 2017 VLDB 8.518235e-05
2,693 An Architecture for Recycling Intermediates in a Column-store 2009 SIGMOD 8.2883398e-05
2,700 Filter Before You Parse: Faster Analytics on Raw Data with Sparser 2018 VLDB 8.2728509e-05
2,888 Sato: Contextual Semantic Type Detection in Tables 2020 VLDB 7.9594996e-05
2,928 WANalytics: Analytics for a Geo-Distributed Data-Intensive World 2015 CIDR 7.8812874e-05
2,934 AIDA - Abstraction for Advanced In-Database Analytics 2018 VLDB 7.8595778e-05
3,918 On Optimizing Operator Fusion Plans for Large-Scale Machine Learning in SystemML 2018 VLDB 6.6315176e-05
4,197 Incremental View Maintenance with Triple Lock Factorization Benefits 2018 SIGMOD 6.367895e-05
4,326 Fast Queries Over Heterogeneous Data Through Engine Customization 2016 VLDB 6.288323e-05
4,607 Data Integration and Machine Learning: A Natural Synergy 2018 SIGMOD 6.0538827e-05
4,802 Resource Elasticity for Large-Scale Machine Learning 2015 SIGMOD 5.9114415e-05
4,833 MNC: Structure-Exploiting Sparsity Estimation for Matrix Expressions 2019 SIGMOD 5.8916346e-05
5,294 GLADE: Big Data Analytics Made Easy 2012 SIGMOD 5.5810654e-05
8,078 Meta-Dataflows: Efficient Exploratory Dataflow Jobs 2018 SIGMOD 4.5914967e-05
9,437 BlockJoin: Efficient Matrix Partitioning Through Joins 2017 VLDB 4.3425552e-05
Previous Page 1 / 1 Next

Semantically Similar Papers