JoinBoost: Grow Trees Over Normalized Data Using Only SQL
Summary: JoinBoost compiles tree training into pure SQL over normalized joins, enabling factorized gradient boosting by treating Y as residuals on non-materialized joins using the variance semiring to support RMSE. Portable to any DBMS (DuckDB demos), it reduces residual-update costs via a residual projection column and outperforms prior systems—≈3× faster than LightGBM for RF and >10× vs prior In-DB ML—scaling to large schemas, many features, and complex join graphs. (summarized by gpt-5-mini on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Zezhou Huang
- 2. Rathijit Sen
- 3. Jiaxiang Liu
- 4. Eugene Wu
Incoming Citations (Sorted by Pagerank)
Showing 6 of 6 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 6,077 | The Fast and the Private: Task-based Dataset Search | 2024 | CIDR | 5.2229324e-05 |
| 6,378 | Mitigating the Impedance Mismatch between Prediction Query Execution and Database Engine | 2025 | SIGMOD | 5.0909804e-05 |
| 8,080 | Biathlon: Harnessing Model Resilience for Accelerating ML Inference Pipelines | 2024 | VLDB | 4.5911668e-05 |
| 9,856 | In-Database Data Imputation | 2024 | SIGMOD | 4.269353e-05 |
| 10,177 | InferF: Declarative Factorization of AI/ML Inferences over Joins | 2026 | SIGMOD | 4.1945683e-05 |
| 10,571 | Quantum Data Management in the NISQ Era | 2025 | VLDB | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 23 of 23 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 6,241 | Scaling Similarity Joins over Tree-Structured Data | 2015 | VLDB | 5.1411469e-05 |
| 4,722 | Reducing Multidatabase Query Response Time By Tree Balancing | 1995 | SIGMOD | 5.9717332e-05 |
| 9,317 | Are Joins over LSM-trees Ready? Take RocksDB as an Example | 2025 | VLDB | 4.3556432e-05 |
| 4,417 | Robust Query Driven Cardinality Estimation under Changing Workloads | 2023 | VLDB | 6.2037371e-05 |
| 7,179 | Coresets over Multiple Tables for Feature-rich and Data-efficient Machine Learning | 2023 | VLDB | 4.8078895e-05 |
| 903 | To Join or Not to Join? Thinking Twice about Joins before Feature Selection | 2016 | SIGMOD | 0.0001547016 |
| 834 | Learning Linear Regression Models over Factorized Joins | 2016 | SIGMOD | 0.00016135159 |
| 11,220 | Lightweight Materialization for Fast Dashboards Over Joins | 2023 | SIGMOD | 4.1945683e-05 |
| 9,469 | DimBoost: Boosting Gradient Boosting Decision Tree to Higher Dimensions | 2018 | SIGMOD | 4.3342363e-05 |
| 1,167 | Learning Generalized Linear Models Over Normalized Data | 2015 | SIGMOD | 0.00013547713 |