Provenance for Generalized Map and Reduce Workflows
Summary: Proposes a formal provenance model for generalized map-and-reduce workflows (acyclic DAGs of map/reduce), enabling recursive composition and both backward and forward tracing between inputs and outputs. Implements transparent, wrapper-based capture in Hadoop that preserves parallelism and fault-tolerance and reports prototype performance. (summarized by gpt-5-mini on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Robert Ikeda
- 2. Hyunjung Park
- 3. Jennifer Widom
Incoming Citations (Sorted by Pagerank)
Showing 18 of 18 citing papers.
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 8 of 8 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 3 | Pig Latin: A Not-So-Foreign Language for Data Processing | 2008 | SIGMOD | 0.0024183614 |
| 15 | Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters | 2007 | SIGMOD | 0.0010654262 |
| 70 | Hive - A Warehousing Solution Over a Map-Reduce Framework | 2009 | VLDB | 0.00059533166 |
| 611 | Lineage Tracing for General Data Warehouse Transformations | 2001 | VLDB | 0.00019231115 |
| 923 | Provenance and Scientific Workflows: Challenges and Opportunities | 2008 | SIGMOD | 0.0001527609 |
| 1,765 | Efficient Lineage Tracking For Scientific Workflows | 2008 | SIGMOD | 0.00010630348 |
| 1,861 | Efficient Provenance Storage | 2008 | SIGMOD | 0.00010287053 |
| 1,866 | Update Exchange with Mappings and Provenance | 2007 | VLDB | 0.00010272139 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 923 | Provenance and Scientific Workflows: Challenges and Opportunities | 2008 | SIGMOD | 0.0001527609 |
| 2,028 | Putting Lipstick on Pig: Enabling Database-style Workflow Provenance | 2012 | VLDB | 9.7433981e-05 |
| 11,665 | Ursprung: Provenance for Large-Scale Analytics Environments | 2019 | SIGMOD | 4.1945683e-05 |
| 8,504 | Distributed Time-aware Provenance | 2013 | VLDB | 4.496125e-05 |
| 7,132 | Enabling Privacy in Provenance-Aware Workflow Systems | 2011 | CIDR | 4.8227603e-05 |
| 5,843 | Tracing Lineage Beyond Relational Operators | 2007 | VLDB | 5.3032967e-05 |
| 611 | Lineage Tracing for General Data Warehouse Transformations | 2001 | VLDB | 0.00019231115 |
| 8,163 | Capturing and Querying Fine-grained Provenance of Preprocessing Pipelines in Data Science | 2021 | VLDB | 4.5723431e-05 |
| 1,765 | Efficient Lineage Tracking For Scientific Workflows | 2008 | SIGMOD | 0.00010630348 |
| 3,700 | RAMP: A System for Capturing and Tracing Provenance in MapReduce Workflows | 2011 | VLDB | 6.8307955e-05 |