Dagger: A Data (not code) Debugger
Summary: Dagger: an end-to-end data (not code) debugger exposing data-centric primitives to perform interactive what‑if analyses and targeted fixes on script-based analytics pipelines. Unique for abstracting transformations and lineage-aware interventions to speed diagnosis/repair; prototype deployed in Data Civilizer 2.0 for clinical pipelines. (summarized by gpt-5-mini on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. El Kindi Rezig
- 2. Lei Cao
- 3. Giovanni Simonini
- 4. Maxime Schoemans
- 5. Samuel Madden
- 6. Mourad Ouzzani
- 7. Nan Tang
- 8. Michael Stonebraker
Incoming Citations (Sorted by Pagerank)
Showing 11 of 11 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 5,153 | Horizon: Scalable Dependency-driven Data Cleaning | 2021 | VLDB | 5.6607963e-05 |
| 6,291 | Lightweight Inspection of Data Preprocessing in Native Machine Learning Pipelines | 2021 | CIDR | 5.1269764e-05 |
| 6,944 | DataPrism: Exposing Disconnect between Data and Systems | 2022 | SIGMOD | 4.8912787e-05 |
| 7,303 | DICE: Data Discovery by Example | 2021 | VLDB | 4.7684686e-05 |
| 9,118 | Towards Observability for Production Machine Learning Pipelines | 2022 | VLDB | 4.3928288e-05 |
| 9,253 | Glean: Structured Extractions from Templatic Documents | 2021 | VLDB | 4.3690661e-05 |
| 9,306 | Debugging Large-Scale Data Science Pipelines using Dagger | 2020 | VLDB | 4.3572942e-05 |
| 9,984 | Towards Scalable Visual Data Wrangling via Direct Manipulation | 2026 | CIDR | 4.1945683e-05 |
| 10,828 | Buckaroo: A Direct Manipulation Visual Data Wrangler | 2025 | VLDB | 4.1945683e-05 |
| 11,313 | Towards Observability for Machine Learning Pipelines | 2022 | CIDR | 4.1945683e-05 |
| 13,232 | Data Cleaning in the Era of Data Science: Challenges and Opportunities | 2021 | CIDR | - |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 7 of 7 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 1,565 | Principles of Dataset Versioning: Exploring the Recreation/Storage Tradeoff | 2015 | VLDB | 0.00011345567 |
| 2,037 | OrpheusDB: Bolt-on Versioning for Relational Databases | 2017 | VLDB | 9.7120139e-05 |
| 2,152 | MISTIQUE: A System to Store and Query Model Intermediates for Model Diagnosis | 2018 | SIGMOD | 9.4239787e-05 |
| 2,430 | Decibel: The Relational Dataset Branching System | 2016 | VLDB | 8.8330417e-05 |
| 2,825 | Smile: A System to Support Machine Learning on EEG Data at Scale | 2019 | VLDB | 8.0563426e-05 |
| 3,023 | Helix: Accelerating Human-in-the-loop Machine Learning | 2018 | VLDB | 7.6929986e-05 |
| 8,000 | Data Civilizer 2.0: A Holistic Framework for Data Preparation and Analytics | 2019 | VLDB | 4.6092803e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 5,341 | Inspector Gadget: A Framework for Custom Monitoring and Debugging of Distributed Dataflows | 2011 | SIGMOD | 5.5607484e-05 |
| 10,820 | APEX-DAG: Library and Language independent Pipeline EXtraction | 2025 | VLDB | 4.1945683e-05 |
| 8,341 | BugDoc: Algorithms to Debug Computational Processes | 2020 | SIGMOD | 4.5433282e-05 |
| 11,147 | Reconstructing and Querying ML Pipeline Intermediates | 2023 | CIDR | 4.1945683e-05 |
| 1,277 | The Data Civilizer System | 2017 | CIDR | 0.00012879695 |
| 5,058 | A Demo of the Data Civilizer System | 2017 | SIGMOD | 5.7280139e-05 |
| 9,220 | BugDoc: A System for Debugging Computational Pipelines | 2020 | SIGMOD | 4.3702188e-05 |
| 4,426 | Data Debugging and Exploration with Vizier | 2019 | SIGMOD | 6.1969994e-05 |
| 8,000 | Data Civilizer 2.0: A Holistic Framework for Data Preparation and Analytics | 2019 | VLDB | 4.6092803e-05 |
| 9,306 | Debugging Large-Scale Data Science Pipelines using Dagger | 2020 | VLDB | 4.3572942e-05 |