Database Paper Browser

Back to papers

Coresets over Multiple Tables for Feature-rich and Data-efficient Machine Learning

Summary: Push coreset weighted-gradient computation into per-table partial feature-similarity sketches to select coresets without materializing feature-augmented joins. Prove upper bounds on the aggregated gradient approximation and report ~100x speedups with negligible accuracy loss. (summarized by gpt-5-mini on Feb 09 2026)

Paper ID
13318
Venue
VLDB
Year
2023
Pagerank
4.8078895e-05
Overall Rank
7,179 | 50.06%
DOI
10.14778/3561261.3561267

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 8 of 8 citing papers.

Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 25 of 25 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
71 How Good Are Query Optimizers, Really? 2016 VLDB 0.00059038975
140 The MADlib Analytics Library or MAD Skills, the SQL 2012 VLDB 0.00042270404
168 MAD Skills: New Analysis Practices for Big Data 2009 VLDB 0.00038946305
557 SystemML: Declarative Machine Learning on Spark 2016 VLDB 0.00020197988
640 Bao: Making Learned Query Optimization Practical 2021 SIGMOD 0.00018759152
834 Learning Linear Regression Models over Factorized Joins 2016 SIGMOD 0.00016135159
850 Scaling Factorization Machines to Relational Data 2013 VLDB 0.00015955971
903 To Join or Not to Join? Thinking Twice about Joins before Feature Selection 2016 SIGMOD 0.0001547016
910 NeuroCard: One Cardinality Estimator for All Tables 2021 VLDB 0.00015423056
1,167 Learning Generalized Linear Models Over Normalized Data 2015 SIGMOD 0.00013547713
1,279 Towards Linear Algebra over Normalized Data 2017 VLDB 0.00012868394
1,369 Random Sampling over Joins Revisited 2018 SIGMOD 0.00012339777
1,463 ARDA: Automatic Relational Data Augmentation for Machine Learning 2020 VLDB 0.00011869295
2,194 Enabling and Optimizing Non-linear Feature Interactions in Factorized Linear Algebra 2019 SIGMOD 9.3138337e-05
4,129 Are Key-Foreign Key Joins Safe to Avoid when Learning High-Capacity Classifiers? 2018 VLDB 6.428887e-05
4,159 F: Regression Models over Factorized Views 2016 VLDB 6.3993326e-05
4,543 FACE: A Normalizing Flow based Cardinality Estimator 2022 VLDB 6.1011198e-05
4,785 Demonstration of Santoku: Optimizing Machine Learning over Normalized Data 2015 VLDB 5.9236989e-05
5,362 Cost-Effective Crowdsourced Entity Resolution: A Partial-Order Approach 2016 SIGMOD 5.5473503e-05
5,381 Selective Data Acquisition in the Wild for Model Charging 2022 VLDB 5.5399508e-05
5,963 Automatic Data Acquisition for Deep Learning 2021 VLDB 5.2526794e-05
7,575 Human-in-the-loop Outlier Detection 2020 SIGMOD 4.7068909e-05
8,595 Towards A Polyglot Framework for Factorized ML 2021 VLDB 4.4889397e-05
11,582 Interactively Discovering and Ranking Desired Tuples without Writing SQL Queries 2020 SIGMOD 4.1945683e-05
11,788 CDB: Optimizing Queries with Crowd-Based Selections and Joins 2017 SIGMOD 4.1945683e-05
Previous Page 1 / 1 Next

Semantically Similar Papers