Database Paper Browser

Back to papers

Data Management in Machine Learning: Challenges, Techniques, and Systems

Summary: Survey of data-management challenges and systems for ML workloads. Three lines of work: integrating ML with DBMS; adapting DB techniques to ML (queries, partitioning, compression); and combining data-management with ML lifecycles, plus open directions. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
5333
Venue
SIGMOD
Year
2017
Pagerank
0.00011472681
Overall Rank
1,532 | 89.35%
DOI
10.1145/3035918.3054775

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 31 of 31 citing papers.

Rank Citing Paper Year Venue Pagerank
683 Cerebro: A Data System for Optimized Deep Learning Model Selection 2020 VLDB 0.00018195476
1,940 SliceLine: Fast, Linear-Algebra-based Slice Finding for ML Model Debugging 2021 SIGMOD 0.00010020173
2,280 SMOKE: Fine-grained Lineage at Interactive Speed 2018 VLDB 9.1111033e-05
2,934 AIDA - Abstraction for Advanced In-Database Analytics 2018 VLDB 7.8595778e-05
3,145 Opportunities for Quantum Acceleration of Databases: Optimization of Queries and Transaction Schedules 2023 VLDB 7.4781724e-05
3,254 Query Processing on Tensor Computation Runtimes 2022 VLDB 7.3161051e-05
3,407 End-to-end Optimization of Machine Learning Prediction Queries 2022 SIGMOD 7.1295646e-05
3,473 AI Meets Database: AI4DB and DB4AI 2021 SIGMOD 7.062864e-05
4,033 In-RDBMS Hardware Acceleration of Advanced Analytics 2018 VLDB 6.5113267e-05
4,196 Overton: A Data System for Monitoring and Improving Machine-Learned Products 2020 CIDR 6.3686231e-05
4,197 Incremental View Maintenance with Triple Lock Factorization Benefits 2018 SIGMOD 6.367895e-05
4,607 Data Integration and Machine Learning: A Natural Synergy 2018 SIGMOD 6.0538827e-05
4,787 The Relational Data Borg is Learning 2020 VLDB 5.9224501e-05
4,833 MNC: Structure-Exploiting Sparsity Estimation for Matrix Expressions 2019 SIGMOD 5.8916346e-05
5,978 Rotom: A Meta-Learned Data Augmentation Framework for Entity Matching, Data Cleaning, Text Classification, and Beyond 2021 SIGMOD 5.2453012e-05
6,330 Efficient Construction of Approximate Ad-Hoc ML models Through Materialization and Reuse 2018 VLDB 5.1077416e-05
6,373 DeepBase: Deep Inspection of Neural Networks 2019 SIGMOD 5.0929326e-05
6,404 ColumnML: Column-Store Machine Learning with On-The-Fly Data Transformation 2019 VLDB 5.0786954e-05
6,526 Data Collection and Quality Challenges for Deep Learning 2020 VLDB 5.0267429e-05
6,645 Functional-Style SQL UDFs With a Capital 'F' 2020 SIGMOD 4.978205e-05
7,306 DAPHNE: An Open and Extensible System Infrastructure for Integrated Data Analysis Pipelines 2022 CIDR 4.7678574e-05
7,369 Using VDMS to Index and Search 100M Images 2021 VLDB 4.750437e-05
7,411 ItemSuggest: A Data Management Platform for Machine Learned Ranking Services 2019 CIDR 4.7364436e-05
8,182 SHiFT: An Efficient, Flexible Search Engine for Transfer Learning 2023 VLDB 4.5659133e-05
8,789 Machine Learning Meets Big Spatial Data 2019 VLDB 4.4509194e-05
8,864 Cerebro: A Layered Data Platform for Scalable Deep Learning 2021 CIDR 4.4326439e-05
8,980 HADAD: A Lightweight Approach for Optimizing Hybrid Complex Analytics Queries 2021 SIGMOD 4.4169807e-05
9,075 ParaX: Boosting Deep Learning for Big Data Analytics on Many-Core CPUs 2021 VLDB 4.4020349e-05
9,856 In-Database Data Imputation 2024 SIGMOD 4.269353e-05
11,339 Redundancy Elimination in Distributed Matrix Computation 2022 SIGMOD 4.1945683e-05
11,476 Enforcing Constraints for Machine Learning Systems via Declarative Feature Selection: An Experimental Study 2021 SIGMOD 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 13 of 63 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 2 / 2 Next

Semantically Similar Papers

Overall Rank Paper Year Venue Pagerank
9,835 Is Data Management the Beating Heart of AI Systems? 2022 SIGMOD 4.2747054e-05
4,003 Data Platform for Machine Learning 2019 SIGMOD 6.54347e-05
939 Data Lake Management: Challenges and Opportunities 2019 VLDB 0.00015187344
7,655 Machine Learning for Cloud Data Systems: the Progress so far and the Path Forward 2021 VLDB 4.6872456e-05
7,020 LLM for Data Management 2024 VLDB 4.8595728e-05
8,346 Deep Learning: Systems and Responsibility 2021 SIGMOD 4.5420668e-05
10,843 Machine Learning for Graph Data Management and Query Processing 2025 VLDB 4.1945683e-05
8,637 Machine Learning for Data Management: Problems and Solutions 2018 SIGMOD 4.479892e-05
1,420 Data Management Challenges in Production Machine Learning 2017 SIGMOD 0.00012057956
4,906 Machine Learning for Big Data 2013 SIGMOD 5.8389053e-05