Database Paper Browser

Back to papers

JoinBoost: Grow Trees Over Normalized Data Using Only SQL

Summary: JoinBoost compiles tree training into pure SQL over normalized joins, enabling factorized gradient boosting by treating Y as residuals on non-materialized joins using the variance semiring to support RMSE. Portable to any DBMS (DuckDB demos), it reduces residual-update costs via a residual projection column and outperforms prior systems—≈3× faster than LightGBM for RF and >10× vs prior In-DB ML—scaling to large schemas, many features, and complex join graphs. (summarized by gpt-5-mini on Feb 09 2026)

Paper ID
13146
Venue
VLDB
Year
2023
Pagerank
4.6163888e-05
Overall Rank
7,920 | 44.91%
DOI
10.14778/3611479.3611509

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 6 of 6 citing papers.

Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 23 of 23 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
31 Provenance Semirings 2007 PODS 0.0007857786
140 The MADlib Analytics Library or MAD Skills, the SQL 2012 VLDB 0.00042270404
185 DuckDB: an Embeddable Analytical Database 2019 SIGMOD 0.00036538405
211 Join Synopses for Approximate Query Answering 1999 SIGMOD 0.00033981214
241 DB2 with BLU Acceleration: So Much More than Just a Column Store 2013 VLDB 0.00031420034
342 EmptyHeaded: A Relational Engine for Graph Processing 2016 SIGMOD 0.00026795977
419 Fast Serializable Multi-Version Concurrency Control for Main-Memory Database Systems 2015 SIGMOD 0.00023720338
542 Shark: SQL and Rich Analytics at Scale 2013 SIGMOD 0.00020595648
543 MLbase: A Distributed Machine-learning System 2013 CIDR 0.00020526854
557 SystemML: Declarative Machine Learning on Spark 2016 VLDB 0.00020197988
583 FAQ: Questions Asked Frequently 2016 PODS 0.00019717214
658 Towards a Unified Architecture for in-RDBMS Analytics 2012 SIGMOD 0.00018506577
834 Learning Linear Regression Models over Factorized Joins 2016 SIGMOD 0.00016135159
1,369 Random Sampling over Joins Revisited 2018 SIGMOD 0.00012339777
2,169 AJAR: Aggregations and Joins over Annotated Relations 2016 PODS 9.3845975e-05
3,006 On Functional Aggregate Queries with Additive Inequalities 2019 PODS 7.7299363e-05
3,277 A Layered Aggregate Engine for Analytics Workloads 2019 SIGMOD 7.2871625e-05
3,958 MLog: Towards Declarative In-Database Machine Learning 2017 VLDB 6.5897636e-05
4,197 Incremental View Maintenance with Triple Lock Factorization Benefits 2018 SIGMOD 6.367895e-05
5,088 TCUDB: Accelerating Database with Tensor Processors 2022 SIGMOD 5.7072189e-05
5,951 PGMJoins: Random Join Sampling with Graphical Models 2021 SIGMOD 5.2592385e-05
9,695 Share the Tensor Tea: How Databases can Leverage the Machine Learning Ecosystem 2022 VLDB 4.3025567e-05
9,706 Distributed Numerical and Machine Learning Computations via Two-Phase Execution of Aggregated Join Trees 2021 VLDB 4.2992942e-05
Previous Page 1 / 1 Next

Semantically Similar Papers