Back to papers
LIMA: Fine-grained Lineage Tracing and Reuse in Machine Learning Systems
Summary: Fine-grained lineage tracing and reuse in ML systems (LIMA) to break coarse, black-box limits. Multi-level traces, loop/function dedup, and cross-hierarchy reuse enable low-overhead provenance with versioning, compatible with task parallelism and operator fusion, delivering up to 12.4x speedups.
(summarized by gpt-5-nano on Feb 09 2026)
- Paper ID
- 6069
- Venue
- SIGMOD
- Year
- 2021
- Pagerank
- 5.9316087e-05
- Overall Rank
- 4,774 | 66.79%
- DOI
-
10.1145/3448016.3452788
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 15 of 15 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
| 7,306 |
DAPHNE: An Open and Extensible System Infrastructure for Integrated Data Analysis Pipelines |
2022 |
CIDR |
4.7678574e-05 |
| 7,482 |
Provenance-Enabled Explainable AI |
2024 |
SIGMOD |
4.7180617e-05 |
| 7,656 |
Nautilus: An Optimized System for Deep Transfer Learning over Evolving Training Datasets |
2022 |
SIGMOD |
4.6871575e-05 |
| 7,704 |
ExDRa: Exploratory Data Science on Federated Raw Data |
2021 |
SIGMOD |
4.6733838e-05 |
| 8,092 |
Saga: A Scalable Framework for Optimizing Data Cleaning Pipelines for Machine Learning Applications |
2023 |
SIGMOD |
4.587921e-05 |
| 8,514 |
UPLIFT: Parallelization Strategies for Feature Transformations in Machine Learning Workloads |
2022 |
VLDB |
4.4944285e-05 |
| 9,806 |
The Image Calculator: 10x Faster Image-AI Inference by Replacing JPEG with Self-designing Storage Format |
2024 |
SIGMOD |
4.2805224e-05 |
| 9,912 |
ElasticNotebook: Enabling Live Migration for Computational Notebooks |
2024 |
VLDB |
4.2565279e-05 |
| 10,252 |
CAPS: Cost-Aware ML Pipeline Selection |
2026 |
VLDB |
4.1945683e-05 |
| 10,291 |
Morphing-based Compression for Data-centric ML Pipelines |
2026 |
VLDB |
4.1945683e-05 |
| 10,419 |
Unified Lineage System: Tracking Data Provenance at Scale |
2025 |
SIGMOD |
4.1945683e-05 |
| 10,469 |
Alsatian: Optimizing Model Search for Deep Transfer Learning |
2025 |
SIGMOD |
4.1945683e-05 |
| 10,628 |
CatDB: Data-catalog-guided, LLM-based Generation of Data-centric ML Pipelines |
2025 |
VLDB |
4.1945683e-05 |
| 10,842 |
ML-Asset Management: Curation, Discovery, and Utilization |
2025 |
VLDB |
4.1945683e-05 |
| 11,339 |
Redundancy Elimination in Distributed Matrix Computation |
2022 |
SIGMOD |
4.1945683e-05 |
Outgoing Citations (Sorted by Pagerank)
Showing 4 of 54 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 9,222 |
Towards an Optimized GROUP BY Abstraction for Large-Scale Machine Learning |
2021 |
VLDB |
4.3698672e-05 |
| 8,092 |
Saga: A Scalable Framework for Optimizing Data Cleaning Pipelines for Machine Learning Applications |
2023 |
SIGMOD |
4.587921e-05 |
| 1,765 |
Efficient Lineage Tracking For Scientific Workflows |
2008 |
SIGMOD |
0.00010630348 |
| 2,350 |
An Intermediate Representation for Optimizing Machine Learning Pipelines |
2019 |
VLDB |
8.9788641e-05 |
| 6,291 |
Lightweight Inspection of Data Preprocessing in Native Machine Learning Pipelines |
2021 |
CIDR |
5.1269764e-05 |
| 8,163 |
Capturing and Querying Fine-grained Provenance of Preprocessing Pipelines in Data Science |
2021 |
VLDB |
4.5723431e-05 |
| 3,918 |
On Optimizing Operator Fusion Plans for Large-Scale Machine Learning in SystemML |
2018 |
VLDB |
6.6315176e-05 |
| 6,053 |
Optimizing Machine Learning Workloads in Collaborative Environments |
2020 |
SIGMOD |
5.2326838e-05 |
| 2,456 |
Production Machine Learning Pipelines: Empirical Analysis and Optimization Opportunities |
2021 |
SIGMOD |
8.7733773e-05 |
| 6,469 |
Materialization and Reuse Optimizations for Production Data Science Pipelines |
2022 |
SIGMOD |
5.0519488e-05 |