Arun Kumar

Author ID: 871
ORCID: -
Links: (found by gpt-5.2 on feb 8th, 2026)
Most Frequent Institution: University of California San Diego
Pagerank: 0.34451867
Overall Rank: 122 | 99.42%
Paper Count: 42

Affiliation Timeline

University of California San Diego Most frequent 2017 - 2025 | 33 papers
University of Wisconsin 2012 - 2016 | 9 papers

Incoming Non-self Citations Over Time

Total yearly non-self incoming citations across all papers by this author.

Publications by Paper Pagerank

Showing 42 of 42 publications.

Rank	Title	Year	Venue	Pagerank
139	The MADlib Analytics Library or MAD Skills, the SQL	2012	VLDB	0.00042320525
638	Towards a Unified Architecture for in-RDBMS Analytics	2012	SIGMOD	0.00018810785
684	Cerebro: A Data System for Optimized Deep Learning Model Selection	2020	VLDB	0.00018152321
758	Materialization Optimizations for Feature Selection Workloads	2014	SIGMOD	0.00017053915
901	To Join or Not to Join? Thinking Twice about Joins before Feature Selection	2016	SIGMOD	0.00015462938
1,172	Learning Generalized Linear Models Over Normalized Data	2015	SIGMOD	0.00013504249
1,283	Towards Linear Algebra over Normalized Data	2017	VLDB	0.00012826013
1,534	Data Management in Machine Learning: Challenges, Techniques, and Systems	2017	SIGMOD	0.00011462072
1,891	Towards Model-based Pricing for Machine Learning in a Data Marketplace	2019	SIGMOD	0.0001018452
2,197	Enabling and Optimizing Non-linear Feature Interactions in Factorized Linear Algebra	2019	SIGMOD	9.3117431e-05
2,871	Incremental and Approximate Inference for Faster Occlusion-based Deep CNN Explanations	2019	SIGMOD	7.9800983e-05
2,892	VISTA: Optimized System for Declarative Feature Transfer from Deep CNNs at Scale	2020	SIGMOD	7.9570135e-05
2,919	Brainwash: A Data System for Feature Engineering	2013	CIDR	7.9017482e-05
3,212	Panorama: A Data System for Unbounded Vocabulary Querying over Video	2020	VLDB	7.3772955e-05
3,640	Bolt-on Differential Privacy for Scalable Stochastic Gradient Descent-based Analytics	2017	SIGMOD	6.8886042e-05
3,953	A Comparative Evaluation of Systems for Scalable Linear Algebra-based Analytics	2018	VLDB	6.5896733e-05
4,040	In-RDBMS Hardware Acceleration of Advanced Analytics	2018	VLDB	6.5052227e-05
4,123	Are Key-Foreign Key Joins Safe to Avoid when Learning High-Capacity Classifiers?	2018	VLDB	6.4290005e-05
4,283	The future of data(base) education: Is the "cow book" dead?	2021	VLDB	6.2824978e-05
4,374	Understanding and Benchmarking the Impact of GDPR on Database Systems	2020	VLDB	6.2346849e-05
4,470	Demonstration of SpeakQL: Speech-driven Multimodal Querying of Structured Data	2019	SIGMOD	6.1525931e-05
4,601	Distributed Deep Learning on Data Systems: A Comparative Analysis of Approaches	2021	VLDB	6.05274e-05
4,794	Demonstration of Santoku: Optimizing Machine Learning over Normalized Data	2015	VLDB	5.910645e-05
5,244	Towards Benchmarking Feature Type Inference for AutoML Platforms	2021	SIGMOD	5.6021738e-05
5,448	SNAILS: Schema Naming Assessments for Improved LLM-Based SQL Inference	2025	SIGMOD	5.4980173e-05
5,535	SpeakQL: Towards Speech-driven Multimodal Querying of Structured Data	2020	SIGMOD	5.4552771e-05
6,535	Tuple-oriented Compression for Large-scale Mini-batch Stochastic Gradient Descent	2019	SIGMOD	5.0184189e-05
6,546	Demonstration of Nimbus: Model-based Pricing for Machine Learning in a Data Marketplace	2019	SIGMOD	5.0127399e-05
6,552	How do Categorical Duplicates Affect ML? A New Benchmark and Empirical Analyses	2024	VLDB	5.0109216e-05
6,885	Lotan: Bridging the Gap between GNNs and Scalable Graph Analytics Engines	2023	VLDB	4.8908367e-05
7,270	Feature Selection in Enterprise Analytics: A Demonstration using an R-based Data Analytics System	2013	VLDB	4.776491e-05
7,656	Nautilus: An Optimized System for Deep Transfer Learning over Evolving Training Datasets	2022	SIGMOD	4.6826896e-05
8,125	Automation of Data Prep, ML, and Data Science: New Cure or Snake Oil?	2021	SIGMOD	4.5765541e-05
8,377	Probabilistic Management of OCR Data using an RDBMS	2012	VLDB	4.5277353e-05
8,593	Towards A Polyglot Framework for Factorized ML	2021	VLDB	4.4846362e-05
8,864	Cerebro: A Layered Data Platform for Scalable Deep Learning	2021	CIDR	4.4283952e-05
9,225	Towards an Optimized GROUP BY Abstraction for Large-Scale Machine Learning	2021	VLDB	4.3656789e-05
9,226	Intermittent Human-in-the-Loop Model Selection using Cerebro: A Demonstration	2021	VLDB	4.3656789e-05
9,603	Saturn: An Optimized Data System for Multi-Large-Model Deep Learning Workloads	2024	VLDB	4.3136057e-05
13,185	Reimagining Deep Learning Systems Through the Lens of Data Systems	2024	VLDB	-
13,284	Errata for “Cerebro: A Data System for Optimized Deep Learning Model Selection”	2021	VLDB	-
13,326	Demonstration of Krypton: Optimized CNN Inference for Occlusion-based Deep CNN Explanations	2019	VLDB	-

Frequent Co-authors

Co-authored at least 5 papers.

Co-author	Shared Papers	Rank	Pagerank
Supun Nakandala	8	914	0.074358505
Jeffrey Naughton	6	7	1.0624938
Christopher Ré	6	62	0.50159228
Yuhao Zhang	6	1,686	0.04349018
Jignesh Patel	5	28	0.68764149
Lingjiao Chen	5	1,624	0.045230925
Vraj Shah	5	1,626	0.045116893
Side Li	5	1,640	0.044614962