Doing More with Less: Characterizing Dataset Downsampling for AutoML
Summary: Downsampling large tabular data reshapes AutoML search under fixed time budgets. Empirical study of a genetic-programming AutoML search reveals tradeoffs between pipeline quality and search efficiency, guiding scalable AutoML for big data. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Fatjon Zogaj
- 2. José Pablo Cambronero
- 3. Martin C. Rinard
- 4. Jürgen Cito
Incoming Citations (Sorted by Pagerank)
Showing 2 of 2 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 5,026 | AutoCTS: Automated Correlated Time Series Forecasting | 2022 | VLDB | 5.7528419e-05 |
| 10,252 | CAPS: Cost-Aware ML Pipeline Selection | 2026 | VLDB | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 8 of 8 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 683 | Cerebro: A Data System for Optimized Deep Learning Model Selection | 2020 | VLDB | 0.00018195476 |
| 921 | Democratizing Data Science through Interactive Curation of ML Pipelines | 2019 | SIGMOD | 0.00015337438 |
| 939 | Data Lake Management: Challenges and Opportunities | 2019 | VLDB | 0.00015187344 |
| 1,666 | HELIX: Holistic Optimization for Accelerating Iterative Machine Learning | 2019 | VLDB | 0.0001096361 |
| 1,750 | Weld: A Common Runtime for High Performance Data Analytics | 2017 | CIDR | 0.00010683647 |
| 1,967 | Compressed Linear Algebra for Large-Scale Machine Learning | 2016 | VLDB | 9.9131712e-05 |
| 2,350 | An Intermediate Representation for Optimizing Machine Learning Pipelines | 2019 | VLDB | 8.9788641e-05 |
| 7,311 | The Machine Learning Bazaar: Harnessing the ML Ecosystem for Effective System Development | 2020 | SIGMOD | 4.7656884e-05 |
Previous
Page 1 / 1
Next