Database Paper Browser

Back to papers

Data Management in Machine Learning: Challenges, Techniques, and Systems

Summary: Survey of data-management challenges and systems for ML workloads. Three lines of work: integrating ML with DBMS; adapting DB techniques to ML (queries, partitioning, compression); and combining data-management with ML lifecycles, plus open directions. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
5333
Venue
SIGMOD
Year
2017
Pagerank
0.00011472681
Overall Rank
1,532 | 89.35%
DOI
10.1145/3035918.3054775

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 31 of 31 citing papers.

Rank Citing Paper Year Venue Pagerank
683 Cerebro: A Data System for Optimized Deep Learning Model Selection 2020 VLDB 0.00018195476
1,940 SliceLine: Fast, Linear-Algebra-based Slice Finding for ML Model Debugging 2021 SIGMOD 0.00010020173
2,280 SMOKE: Fine-grained Lineage at Interactive Speed 2018 VLDB 9.1111033e-05
2,934 AIDA - Abstraction for Advanced In-Database Analytics 2018 VLDB 7.8595778e-05
3,145 Opportunities for Quantum Acceleration of Databases: Optimization of Queries and Transaction Schedules 2023 VLDB 7.4781724e-05
3,254 Query Processing on Tensor Computation Runtimes 2022 VLDB 7.3161051e-05
3,407 End-to-end Optimization of Machine Learning Prediction Queries 2022 SIGMOD 7.1295646e-05
3,473 AI Meets Database: AI4DB and DB4AI 2021 SIGMOD 7.062864e-05
4,033 In-RDBMS Hardware Acceleration of Advanced Analytics 2018 VLDB 6.5113267e-05
4,196 Overton: A Data System for Monitoring and Improving Machine-Learned Products 2020 CIDR 6.3686231e-05
4,197 Incremental View Maintenance with Triple Lock Factorization Benefits 2018 SIGMOD 6.367895e-05
4,607 Data Integration and Machine Learning: A Natural Synergy 2018 SIGMOD 6.0538827e-05
4,787 The Relational Data Borg is Learning 2020 VLDB 5.9224501e-05
4,833 MNC: Structure-Exploiting Sparsity Estimation for Matrix Expressions 2019 SIGMOD 5.8916346e-05
5,978 Rotom: A Meta-Learned Data Augmentation Framework for Entity Matching, Data Cleaning, Text Classification, and Beyond 2021 SIGMOD 5.2453012e-05
6,330 Efficient Construction of Approximate Ad-Hoc ML models Through Materialization and Reuse 2018 VLDB 5.1077416e-05
6,373 DeepBase: Deep Inspection of Neural Networks 2019 SIGMOD 5.0929326e-05
6,404 ColumnML: Column-Store Machine Learning with On-The-Fly Data Transformation 2019 VLDB 5.0786954e-05
6,526 Data Collection and Quality Challenges for Deep Learning 2020 VLDB 5.0267429e-05
6,645 Functional-Style SQL UDFs With a Capital 'F' 2020 SIGMOD 4.978205e-05
7,306 DAPHNE: An Open and Extensible System Infrastructure for Integrated Data Analysis Pipelines 2022 CIDR 4.7678574e-05
7,369 Using VDMS to Index and Search 100M Images 2021 VLDB 4.750437e-05
7,411 ItemSuggest: A Data Management Platform for Machine Learned Ranking Services 2019 CIDR 4.7364436e-05
8,182 SHiFT: An Efficient, Flexible Search Engine for Transfer Learning 2023 VLDB 4.5659133e-05
8,789 Machine Learning Meets Big Spatial Data 2019 VLDB 4.4509194e-05
8,864 Cerebro: A Layered Data Platform for Scalable Deep Learning 2021 CIDR 4.4326439e-05
8,980 HADAD: A Lightweight Approach for Optimizing Hybrid Complex Analytics Queries 2021 SIGMOD 4.4169807e-05
9,075 ParaX: Boosting Deep Learning for Big Data Analytics on Many-Core CPUs 2021 VLDB 4.4020349e-05
9,856 In-Database Data Imputation 2024 SIGMOD 4.269353e-05
11,339 Redundancy Elimination in Distributed Matrix Computation 2022 SIGMOD 4.1945683e-05
11,476 Enforcing Constraints for Machine Learning Systems via Declarative Feature Selection: An Experimental Study 2021 SIGMOD 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 50 of 63 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
37 Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud 2012 VLDB 0.0007522744
140 The MADlib Analytics Library or MAD Skills, the SQL 2012 VLDB 0.00042270404
168 MAD Skills: New Analysis Practices for Big Data 2009 VLDB 0.00038946305
359 Self-Driving Database Management Systems 2017 CIDR 0.0002592783
469 MauveDB: Supporting Model-based User Views in Database Systems 2006 SIGMOD 0.00022406923
543 MLbase: A Distributed Machine-learning System 2013 CIDR 0.00020526854
557 SystemML: Declarative Machine Learning on Spark 2016 VLDB 0.00020197988
583 FAQ: Questions Asked Frequently 2016 PODS 0.00019717214
656 ERACER: A Database Approach for Statistical Inference and Data Cleaning 2010 SIGMOD 0.00018588729
658 Towards a Unified Architecture for in-RDBMS Analytics 2012 SIGMOD 0.00018506577
667 Incremental Knowledge Base Construction Using DeepDive 2015 VLDB 0.00018440557
734 The TileDB Array Data Storage Manager 2017 VLDB 0.00017455248
761 Materialization Optimizations for Feature Selection Workloads 2014 SIGMOD 0.00017053783
791 ActiveClean: Interactive Data Cleaning For Statistical Modeling 2016 VLDB 0.00016629664
834 Learning Linear Regression Models over Factorized Joins 2016 SIGMOD 0.00016135159
850 Scaling Factorization Machines to Relational Data 2013 VLDB 0.00015955971
903 To Join or Not to Join? Thinking Twice about Joins before Feature Selection 2016 SIGMOD 0.0001547016
1,014 Tuffy: Scaling up Statistical Inference in Markov Logic Networks using an RDBMS 2011 VLDB 0.00014640258
1,044 DimmWitted: A Study of Main-Memory Statistical Analytics 2014 VLDB 0.00014475229
1,071 Starfish: A Self-tuning System for Big Data Analytics 2011 CIDR 0.00014312777
1,076 RIOT: I/O-Efficient Numerical Computing without SQL 2009 CIDR 0.00014248449
1,158 Simulation of Database-Valued Markov Chains Using SimSQL 2013 SIGMOD 0.0001361064
1,167 Learning Generalized Linear Models Over Normalized Data 2015 SIGMOD 0.00013547713
1,279 Towards Linear Algebra over Normalized Data 2017 VLDB 0.00012868394
1,402 Hybrid Parallelization Strategies for Large-Scale Machine Learning in SystemML 2014 VLDB 0.00012180605
1,873 An Architecture for Compiling UDF-centric Workflows 2015 VLDB 0.00010253002
1,876 ArrayStore: A Storage Manager for Complex Parallel Array Processing 2011 SIGMOD 0.00010239284
1,967 Compressed Linear Algebra for Large-Scale Machine Learning 2016 VLDB 9.9131712e-05
2,084 The Case for Predictive Database Systems: Opportunities and Challenges 2011 CIDR 9.5820534e-05
2,126 MacroBase: Prioritizing Attention in Fast Data 2017 SIGMOD 9.4887794e-05
2,172 Spinning Fast Iterative Data Flows 2012 VLDB 9.3706587e-05
2,251 Vizdom: Interactive Analytics through Pen and Touch 2015 VLDB 9.1986441e-05
2,255 LINVIEW: Incremental View Maintenance for Complex Analytical Queries 2014 SIGMOD 9.1884983e-05
2,307 On Predictive Modeling for Optimizing Transaction Execution in Parallel OLTP Systems 2012 VLDB 9.0599752e-05
2,623 GenBase: A Complex Analytics Genomics Benchmark 2014 SIGMOD 8.4374366e-05
2,667 Cumulon: Optimizing Statistical Data Analysis in the Cloud 2013 SIGMOD 8.3413995e-05
2,818 Implicit Parallelism through Deep Language Embedding 2015 SIGMOD 8.0665558e-05
2,915 Brainwash: A Data System for Feature Engineering 2013 CIDR 7.9078385e-05
3,216 WiSeDB: A Learning-based Workload Management Advisor for Cloud Databases 2016 VLDB 7.3601267e-05
3,445 Processing Forecasting Queries 2007 VLDB 7.08644e-05
3,455 A Comparison of Platforms for Implementing and Running Very Large Scale Machine Learning Algorithms 2014 SIGMOD 7.0771839e-05
3,617 Ava: From Data to Insights Through Conversation 2017 CIDR 6.9091789e-05
4,077 Towards High-Throughput Gibbs Sampling at Scale: A Study across Storage Managers 2013 SIGMOD 6.4678697e-05
4,259 Optimizing I/O for Big Array Analytics 2012 VLDB 6.3147285e-05
4,505 SPOOF: Sum-Product Optimization and Operator Fusion for Large-Scale Machine Learning 2017 CIDR 6.1327108e-05
4,576 The Missing Piece in Complex Analytics: Low Latency, Scalable Model Management and Serving with Velox 2015 CIDR 6.0721464e-05
4,785 Demonstration of Santoku: Optimizing Machine Learning over Normalized Data 2015 VLDB 5.9236989e-05
4,802 Resource Elasticity for Large-Scale Machine Learning 2015 SIGMOD 5.9114415e-05
4,906 Machine Learning for Big Data 2013 SIGMOD 5.8389053e-05
5,294 GLADE: Big Data Analytics Made Easy 2012 SIGMOD 5.5810654e-05
Previous Page 1 / 2 Next

Semantically Similar Papers

Overall Rank Paper Year Venue Pagerank
9,835 Is Data Management the Beating Heart of AI Systems? 2022 SIGMOD 4.2747054e-05
4,003 Data Platform for Machine Learning 2019 SIGMOD 6.54347e-05
939 Data Lake Management: Challenges and Opportunities 2019 VLDB 0.00015187344
7,655 Machine Learning for Cloud Data Systems: the Progress so far and the Path Forward 2021 VLDB 4.6872456e-05
7,020 LLM for Data Management 2024 VLDB 4.8595728e-05
8,346 Deep Learning: Systems and Responsibility 2021 SIGMOD 4.5420668e-05
10,843 Machine Learning for Graph Data Management and Query Processing 2025 VLDB 4.1945683e-05
8,637 Machine Learning for Data Management: Problems and Solutions 2018 SIGMOD 4.479892e-05
1,420 Data Management Challenges in Production Machine Learning 2017 SIGMOD 0.00012057956
4,906 Machine Learning for Big Data 2013 SIGMOD 5.8389053e-05