Database Paper Browser

Back to papers

Democratizing Data Science through Interactive Curation of ML Pipelines

Summary: Interactive AutoML for scientists via curated ML pipelines. Uses query-optimization, cost-based bandits, and Bayesian optimization to achieve interactive latency and beat expert solutions on unseen data across 300+ datasets. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
5675
Venue
SIGMOD
Year
2019
Pagerank
0.00015337438
Overall Rank
921 | 93.60%
DOI
10.1145/3299869.3319863

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 27 of 27 citing papers.

Rank Citing Paper Year Venue Pagerank
1,463 ARDA: Automatic Relational Data Augmentation for Machine Learning 2020 VLDB 0.00011869295
1,751 Auctus: A Dataset Search Engine for Data Discovery and Augmentation 2021 VLDB 0.00010683295
2,122 SystemDS: A Declarative Machine Learning System for the End-to-End Data Science Lifecycle 2020 CIDR 9.4989076e-05
2,321 DBPal: A Fully Pluggable NL2SQL Training Pipeline 2020 SIGMOD 9.03609e-05
3,934 SimpleTS: An Efficient and Universal Model Selection Framework for Time Series Forecasting 2023 VLDB 6.6175631e-05
4,456 AutoOD: Automatic Outlier Detection 2023 SIGMOD 6.1704203e-05
4,554 A Demonstration of AutoOD: A Self-Tuning Anomaly Detection System 2022 VLDB 6.0911296e-05
4,557 Distributed Deep Learning on Data Systems: A Comparative Analysis of Approaches 2021 VLDB 6.087611e-05
4,774 LIMA: Fine-grained Lineage Tracing and Reuse in Machine Learning Systems 2021 SIGMOD 5.9316087e-05
4,957 Doing More with Less: Characterizing Dataset Downsampling for AutoML 2021 VLDB 5.8035715e-05
5,429 DiffPrep: Differentiable Data Preprocessing Pipeline Search for Learning over Tabular Data 2023 SIGMOD 5.5087325e-05
6,053 Optimizing Machine Learning Workloads in Collaborative Environments 2020 SIGMOD 5.2326838e-05
7,311 The Machine Learning Bazaar: Harnessing the ML Ecosystem for Effective System Development 2020 SIGMOD 4.7656884e-05
7,704 ExDRa: Exploratory Data Science on Federated Raw Data 2021 SIGMOD 4.6733838e-05
8,092 Saga: A Scalable Framework for Optimizing Data Cleaning Pipelines for Machine Learning Applications 2023 SIGMOD 4.587921e-05
8,163 Capturing and Querying Fine-grained Provenance of Preprocessing Pipelines in Data Science 2021 VLDB 4.5723431e-05
8,177 DORIAN in action: Assisted Design of Data Science Pipelines 2022 VLDB 4.5673266e-05
8,743 CtxPipe: Context-aware Data Preparation Pipeline Construction for Machine Learning 2024 SIGMOD 4.456315e-05
8,828 HAIPipe: Combining Human-generated and Machine-generated Pipelines for Data Preparation 2023 SIGMOD 4.4407488e-05
9,192 Hyper-Tune: Towards Efficient Hyper-parameter Tuning at Scale 2022 VLDB 4.3765131e-05
10,252 CAPS: Cost-Aware ML Pipeline Selection 2026 VLDB 4.1945683e-05
10,560 A Systematic Study on Early Stopping Metrics in HPO and the Implications of Uncertainty 2025 VLDB 4.1945683e-05
10,628 CatDB: Data-catalog-guided, LLM-based Generation of Data-centric ML Pipelines 2025 VLDB 4.1945683e-05
10,682 AutoPrep: Natural Language Question-Aware Data Preparation with a Multi-Agent Framework 2025 VLDB 4.1945683e-05
11,216 Demystifying the QoS and QoE of Edge-hosted Video Streaming Applications in the Wild with SNESet 2023 SIGMOD 4.1945683e-05
11,476 Enforcing Constraints for Machine Learning Systems via Declarative Feature Selection: An Experimental Study 2021 SIGMOD 4.1945683e-05
11,549 Active Reinforcement Learning for Data Preparation: Learn2Clean with Human-In-The-Loop 2020 CIDR 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 10 of 10 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers