Database Paper Browser

Back to papers

LIMA: Fine-grained Lineage Tracing and Reuse in Machine Learning Systems

Summary: Fine-grained lineage tracing and reuse in ML systems (LIMA) to break coarse, black-box limits. Multi-level traces, loop/function dedup, and cross-hierarchy reuse enable low-overhead provenance with versioning, compatible with task parallelism and operator fusion, delivering up to 12.4x speedups. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
6069
Venue
SIGMOD
Year
2021
Pagerank
5.9316087e-05
Overall Rank
4,774 | 66.79%
DOI
10.1145/3448016.3452788

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 15 of 15 citing papers.

Rank Citing Paper Year Venue Pagerank
7,306 DAPHNE: An Open and Extensible System Infrastructure for Integrated Data Analysis Pipelines 2022 CIDR 4.7678574e-05
7,482 Provenance-Enabled Explainable AI 2024 SIGMOD 4.7180617e-05
7,656 Nautilus: An Optimized System for Deep Transfer Learning over Evolving Training Datasets 2022 SIGMOD 4.6871575e-05
7,704 ExDRa: Exploratory Data Science on Federated Raw Data 2021 SIGMOD 4.6733838e-05
8,092 Saga: A Scalable Framework for Optimizing Data Cleaning Pipelines for Machine Learning Applications 2023 SIGMOD 4.587921e-05
8,514 UPLIFT: Parallelization Strategies for Feature Transformations in Machine Learning Workloads 2022 VLDB 4.4944285e-05
9,806 The Image Calculator: 10x Faster Image-AI Inference by Replacing JPEG with Self-designing Storage Format 2024 SIGMOD 4.2805224e-05
9,912 ElasticNotebook: Enabling Live Migration for Computational Notebooks 2024 VLDB 4.2565279e-05
10,252 CAPS: Cost-Aware ML Pipeline Selection 2026 VLDB 4.1945683e-05
10,291 Morphing-based Compression for Data-centric ML Pipelines 2026 VLDB 4.1945683e-05
10,419 Unified Lineage System: Tracking Data Provenance at Scale 2025 SIGMOD 4.1945683e-05
10,469 Alsatian: Optimizing Model Search for Deep Transfer Learning 2025 SIGMOD 4.1945683e-05
10,628 CatDB: Data-catalog-guided, LLM-based Generation of Data-centric ML Pipelines 2025 VLDB 4.1945683e-05
10,842 ML-Asset Management: Curation, Discovery, and Utilization 2025 VLDB 4.1945683e-05
11,339 Redundancy Elimination in Distributed Matrix Computation 2022 SIGMOD 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 50 of 54 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
3 Pig Latin: A Not-So-Foreign Language for Data Processing 2008 SIGMOD 0.0024183614
31 Provenance Semirings 2007 PODS 0.0007857786
158 Automated Selection of Materialized Views and Indexes for SQL Databases 2000 VLDB 0.00040071492
179 Efficient and Extensible Algorithms for Multi Query Optimization 2000 SIGMOD 0.00037672155
408 Database Cracking 2007 CIDR 0.00023953844
469 MauveDB: Supporting Model-based User Views in Database Systems 2006 SIGMOD 0.00022406923
487 Why Not? 2009 SIGMOD 0.00022050218
557 SystemML: Declarative Machine Learning on Spark 2016 VLDB 0.00020197988
610 Goods: Organizing Google's Datasets 2016 SIGMOD 0.00019232674
667 Incremental Knowledge Base Construction Using DeepDive 2015 VLDB 0.00018440557
761 Materialization Optimizations for Feature Selection Workloads 2014 SIGMOD 0.00017053783
921 Democratizing Data Science through Interactive Curation of ML Pipelines 2019 SIGMOD 0.00015337438
1,281 DataHub: Collaborative Data Science & Dataset Version Management at Scale 2015 CIDR 0.00012854744
1,299 The DataPath System: A Data-Centric Analytic Processing Engine for Large Data Warehouses 2010 SIGMOD 0.00012751522
1,402 Hybrid Parallelization Strategies for Large-Scale Machine Learning in SystemML 2014 VLDB 0.00012180605
1,413 VisTrails: Visualization meets Data Management 2006 SIGMOD 0.00012121257
1,440 Provenance for Generalized Map and Reduce Workflows 2011 CIDR 0.00011961469
1,476 Efficient Exploitation of Similar Subexpressions for Query Processing 2007 SIGMOD 0.00011779092
1,666 HELIX: Holistic Optimization for Accelerating Iterative Machine Learning 2019 VLDB 0.0001096361
1,866 Update Exchange with Mappings and Provenance 2007 VLDB 0.00010272139
1,873 An Architecture for Compiling UDF-centric Workflows 2015 VLDB 0.00010253002
1,922 Selecting Subexpressions to Materialize at Datacenter Scale 2018 VLDB 0.00010082599
2,027 Titian: Data Provenance Support in Spark 2016 VLDB 9.7437067e-05
2,028 Putting Lipstick on Pig: Enabling Database-style Workflow Provenance 2012 VLDB 9.7433981e-05
2,122 SystemDS: A Declarative Machine Learning System for the End-to-End Data Science Lifecycle 2020 CIDR 9.4989076e-05
2,152 MISTIQUE: A System to Store and Query Model Intermediates for Model Diagnosis 2018 SIGMOD 9.4239787e-05
2,163 Elastic Machine Learning Algorithms in Amazon SageMaker 2020 SIGMOD 9.3949234e-05
2,255 LINVIEW: Incremental View Maintenance for Complex Analytical Queries 2014 SIGMOD 9.1884983e-05
2,280 SMOKE: Fine-grained Lineage at Interactive Speed 2018 VLDB 9.1111033e-05
2,350 An Intermediate Representation for Optimizing Machine Learning Pipelines 2019 VLDB 8.9788641e-05
2,359 Data Market Platforms: Trading Data Assets to Solve Data Problems 2020 VLDB 8.9607667e-05
2,372 Predictable Performance for Unpredictable Workloads 2009 VLDB 8.947963e-05
2,667 Cumulon: Optimizing Statistical Data Analysis in the Cloud 2013 SIGMOD 8.3413995e-05
2,693 An Architecture for Recycling Intermediates in a Column-store 2009 SIGMOD 8.2883398e-05
2,863 Incremental and Approximate Inference for Faster Occlusion-based Deep CNN Explanations 2019 SIGMOD 7.9877991e-05
2,896 Evaluating End-to-End Optimization for Data Analytics Applications in Weld 2018 VLDB 7.9452051e-05
3,149 Fine-Grained, Secure and Efficient Data Provenance on Blockchain Systems 2019 VLDB 7.4741595e-05
3,875 Cloudy with High Chance of DBMS: A 10-year Prediction for Enterprise-Grade ML 2020 CIDR 6.675257e-05
3,918 On Optimizing Operator Fusion Plans for Large-Scale Machine Learning in SystemML 2018 VLDB 6.6315176e-05
4,197 Incremental View Maintenance with Triple Lock Factorization Benefits 2018 SIGMOD 6.367895e-05
4,505 SPOOF: Sum-Product Optimization and Operator Fusion for Large-Scale Machine Learning 2017 CIDR 6.1327108e-05
4,576 The Missing Piece in Complex Analytics: Low Latency, Scalable Model Management and Serving with Velox 2015 CIDR 6.0721464e-05
4,595 Juneau: Data Lake Management for Jupyter 2019 VLDB 6.060188e-05
4,607 Data Integration and Machine Learning: A Natural Synergy 2018 SIGMOD 6.0538827e-05
5,433 "Amnesia" - A Selection of Machine Learning Models That Can Forget User Data Very Fast 2020 CIDR 5.5051607e-05
5,487 SPORES: Sum-Product Optimization via Relational Equality Saturation for Large Scale Linear Algebra 2020 VLDB 5.4791501e-05
5,874 Incrementally Maintaining Classification using an RDBMS 2011 VLDB 5.2930628e-05
6,053 Optimizing Machine Learning Workloads in Collaborative Environments 2020 SIGMOD 5.2326838e-05
6,291 Lightweight Inspection of Data Preprocessing in Native Machine Learning Pipelines 2021 CIDR 5.1269764e-05
6,295 Your notebook is not crumby enough, REPLace it 2020 CIDR 5.1249204e-05
Previous Page 1 / 2 Next

Semantically Similar Papers