Database Paper Browser

Back to papers

Expand your Training Limits! Generating Training Data for ML-based Data Management

Summary: DataFarm generates and labels large, heterogeneous query workloads for ML-driven data management. A data-driven whitebox learner uses small workloads and data to synthesize jobs, delivering up to 9x labeling gains (R^2) and 54x cost reductions vs prior work. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
6177
Venue
SIGMOD
Year
2021
Pagerank
5.0316686e-05
Overall Rank
6,519 | 54.65%
DOI
10.1145/3448016.3457286

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 9 of 9 citing papers.

Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 21 of 21 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
71 How Good Are Query Optimizers, Really? 2016 VLDB 0.00059038975
182 LEO - DB2's LEarning Optimizer 2001 VLDB 0.00036962631
204 Learned Cardinalities: Estimating Correlated Joins with Deep Learning 2019 CIDR 0.00034784455
254 Snorkel: Rapid Training Data Creation with Weak Supervision 2018 VLDB 0.00030540555
333 Neo: A Learned Query Optimizer 2019 VLDB 0.00027206884
509 On Active Learning of Record Matching Packages 2010 SIGMOD 0.00021409518
608 DeepDB: Learn from Data, not from Queries! 2020 VLDB 0.00019235898
659 The Making of TPC-DS 2006 VLDB 0.00018500853
758 Deep Unsupervised Cardinality Estimation 2020 VLDB 0.0001706608
806 An End-to-End Learning-based Cost Estimator 2020 VLDB 0.00016434274
884 Plan-Structured Deep Neural Network Models for Query Performance Prediction 2019 VLDB 0.00015654004
1,855 AI Meets AI: Leveraging Query Executions to Improve Index Recommendations 2019 SIGMOD 0.00010315245
2,364 Deep Learning Models for Selectivity Estimation of Multi-Attribute Queries 2020 SIGMOD 8.9554751e-05
3,142 Active Learning for ML Enhanced Database Systems 2020 SIGMOD 7.4815444e-05
3,265 RHEEM: Enabling Cross-Platform Data Processing - May The Big Data Be With You! - 2018 VLDB 7.3083672e-05
3,580 Query Performance Prediction for Concurrent Queries using Graph Embedding 2020 VLDB 6.9500996e-05
3,725 Estimating Cardinalities with Deep Sketches 2019 SIGMOD 6.8170734e-05
5,688 PREDIcT: Towards Predicting the Runtime of Large Scale Iterative Analytics 2013 VLDB 5.3702808e-05
6,278 Uncertainty Aware Query Execution Time Prediction 2014 VLDB 5.1309442e-05
8,113 Learning Table Access Cardinalities with LEO 2002 SIGMOD 4.5826944e-05
9,810 Rheem: Enabling Multi-Platform Task Execution 2016 SIGMOD 4.278405e-05
Previous Page 1 / 1 Next

Semantically Similar Papers