Generating Example Data for Dataflow Programs
Summary: Generates small, semantically faithful intermediate data to illustrate dataflow program semantics rather than full outputs. Tackles highly selective and noninvertible operators with dedicated data generation techniques, validated on real Yahoo!-scale dataflow workloads. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
Incoming Citations (Sorted by Pagerank)
Showing 11 of 11 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 780 | Building a High-Level Dataflow System on top of Map-Reduce: The Pig Experience | 2009 | VLDB | 0.00016775082 |
| 2,124 | Characterizing Schema Mappings via Data Examples | 2010 | PODS | 9.4912951e-05 |
| 2,291 | Data Generation using Declarative Constraints | 2011 | SIGMOD | 9.0926719e-05 |
| 3,866 | Designing and Refining Schema Mappings via Data Examples | 2011 | SIGMOD | 6.6837e-05 |
| 4,330 | Mining Top-K Large Structural Patterns in a Massive Network | 2011 | VLDB | 6.2839861e-05 |
| 4,517 | Generating Databases for Query Workloads | 2010 | VLDB | 6.1178732e-05 |
| 5,452 | QueryVis: Logic-based Diagrams help Users Understand Complicated SQL Queries Faster | 2020 | SIGMOD | 5.4999397e-05 |
| 5,733 | Explaining Wrong Queries Using Small Examples | 2019 | SIGMOD | 5.3483446e-05 |
| 6,821 | Hadoop's Adolescence: An analysis of Hadoop usage in scientific workloads | 2013 | VLDB | 4.9156923e-05 |
| 8,111 | Databases will Visualize Queries too* | 2011 | VLDB | 4.5842786e-05 |
| 8,954 | Understanding Queries by Conditional Instances | 2022 | SIGMOD | 4.4221863e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 5 of 5 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 3 | Pig Latin: A Not-So-Foreign Language for Data Processing | 2008 | SIGMOD | 0.0024183614 |
| 18 | On Random Sampling over Joins | 1999 | SIGMOD | 0.00092385438 |
| 888 | QAGen: Generating Query-Aware Test Databases | 2007 | SIGMOD | 0.00015578618 |
| 949 | Tioga: Providing Data Management Support for Scientific Visualization Applications | 1993 | VLDB | 0.00015111638 |
| 4,638 | Test Data for Relational Queries (Extended abstract) | 1986 | PODS | 6.0291138e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 3,710 | Optimizing Analytic Data Flows for Multiple Execution Engines | 2012 | SIGMOD | 6.8238962e-05 |
| 6,237 | New Trends on Exploratory Methods for Data Analytics | 2017 | VLDB | 5.1435341e-05 |
| 10,118 | Test Data Generation for Complex SQL Queries | 2026 | SIGMOD | 4.1945683e-05 |
| 5,117 | Sampling Algorithms in a Stream Operator | 2005 | SIGMOD | 5.6825418e-05 |
| 8,344 | Exploring the Data Wilderness through Examples | 2019 | SIGMOD | 4.5428111e-05 |
| 3,866 | Designing and Refining Schema Mappings via Data Examples | 2011 | SIGMOD | 6.6837e-05 |
| 538 | The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing | 2015 | VLDB | 0.00020678804 |
| 1,065 | Data-Driven Understanding and Refinement of Schema Mappings | 2001 | SIGMOD | 0.00014338146 |
| 2,611 | Opening the Black Boxes in Data Flow Optimization | 2012 | VLDB | 8.4536967e-05 |
| 5,209 | Explaining Outputs in Modern Data Analytics | 2016 | VLDB | 5.629362e-05 |