Towards Understanding Data Analysis Workflows using a Large Notebook Corpus
Summary: Leverages a large Jupyter notebook corpus to analyze pandas usage and extract data-analysis patterns. Constructs lineage graphs of pandas notebook workflows to enable on-the-fly optimization and synthetic workflow generation using design patterns. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
No non-self incoming citations found for this paper in this database.
Authors
Incoming Citations (Sorted by Pagerank)
Showing 0 of 0 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 0 of 0 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 10,168 | FlowPilot: A Suggestion System for Designing Scientific Workflows | 2026 | SIGMOD | 4.1945683e-05 |
| 8,933 | Querying and Re-Using Workflows with VisTrails | 2008 | SIGMOD | 4.427232e-05 |
| 3,393 | Lux: Always-on Visualization Recommendations for Exploratory Dataframe Workflows | 2022 | VLDB | 7.1483239e-05 |
| 11,396 | DPDS: Assisting Data Science with Data Provenance | 2022 | VLDB | 4.1945683e-05 |
| 13,230 | Automating State Management in Computational Notebooks | 2021 | CIDR | - |
| 6,981 | Dataset Relationship Management | 2019 | CIDR | 4.8743957e-05 |
| 1,644 | Finding Related Tables in Data Lakes for Interactive Data Science | 2020 | SIGMOD | 0.00011041787 |
| 6,409 | Fine-Grained Lineage for Safer Notebook Interactions | 2021 | VLDB | 5.0756653e-05 |
| 3,252 | Auto-Suggest: Learning-to-Recommend Data Preparation Steps Using Data Science Notebooks | 2020 | SIGMOD | 7.3178277e-05 |
| 1,765 | Efficient Lineage Tracking For Scientific Workflows | 2008 | SIGMOD | 0.00010630348 |