Database Paper Browser

Back to papers

Automating and Optimizing Data-Centric What-If Analyses on Native Machine Learning Pipelines

Summary: mlwhatif declaratively specifies data-centric what-if analyses over ML pipelines and auto-generates variants via patches. A 4-rule optimizer executes variants; instrumented dataflow plans enable linear speedups (up to 13x) and data-size independence. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
6631
Venue
SIGMOD
Year
2023
Pagerank
4.5487511e-05
Overall Rank
8,257 | 42.56%
DOI
10.1145/3589273

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 6 of 6 citing papers.

Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 23 of 23 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
179 Efficient and Extensible Algorithms for Multi Query Optimization 2000 SIGMOD 0.00037672155
185 DuckDB: an Embeddable Analytical Database 2019 SIGMOD 0.00036538405
517 Can Foundation Models Wrangle Your Data? 2023 VLDB 0.00021169035
791 ActiveClean: Interactive Data Cleaning For Statistical Modeling 2016 VLDB 0.00016629664
1,298 Efficient Task-Specific Data Valuation for Nearest Neighbor Algorithms 2019 VLDB 0.00012758104
1,337 HoloDetect: Few-Shot Learning for Error Detection 2019 SIGMOD 0.00012497164
1,404 Responsible Data Management 2020 VLDB 0.00012174977
1,427 Towards Scalable Dataframe Systems 2020 VLDB 0.0001204248
1,646 Caravan: Provisioning for What-If Analysis 2013 CIDR 0.00011036992
1,666 HELIX: Holistic Optimization for Accelerating Iterative Machine Learning 2019 VLDB 0.0001096361
1,867 Interpretable Data-Based Explanations for Fairness Debugging 2022 SIGMOD 0.00010272055
2,122 SystemDS: A Declarative Machine Learning System for the End-to-End Data Science Lifecycle 2020 CIDR 9.4989076e-05
2,284 Cost-Based Optimization of Decision Support Queries using Transient-Views 1998 SIGMOD 9.1053836e-05
2,456 Production Machine Learning Pipelines: Empirical Analysis and Optimization Opportunities 2021 SIGMOD 8.7733773e-05
2,896 Evaluating End-to-End Optimization for Data Analytics Applications in Weld 2018 VLDB 7.9452051e-05
3,407 End-to-end Optimization of Machine Learning Prediction Queries 2022 SIGMOD 7.1295646e-05
4,664 Efficient Answering of Historical What-if Queries 2022 SIGMOD 6.0127053e-05
4,734 MLINSPECT: A Data Distribution Debugger for Machine Learning Pipelines 2021 SIGMOD 5.9615384e-05
5,607 HYPER: Hypothetical Reasoning With What-If and How-To Queries Using a Probabilistic Causal Approach 2022 SIGMOD 5.4137872e-05
6,469 Materialization and Reuse Optimizations for Production Data Science Pipelines 2022 SIGMOD 5.0519488e-05
8,514 UPLIFT: Parallelization Strategies for Feature Transformations in Machine Learning Workloads 2022 VLDB 4.4944285e-05
8,853 Complaint-Driven Training Data Debugging at Interactive Speeds 2022 SIGMOD 4.4350727e-05
11,310 Screening Native ML Pipelines with “ArgusEyes” 2022 CIDR 4.1945683e-05
Previous Page 1 / 1 Next

Semantically Similar Papers