spade: Synthesizing Data Quality Assertions for Large Language Model Pipelines
Summary: spade synthesizes data-quality assertions for LLM pipelines by mining prompt-version histories to generate candidate assertion functions and selecting a minimal set meeting coverage and accuracy constraints. Yields fewer assertions and ~21% fewer false failures; deployed in LangSmith. (summarized by gpt-5-mini on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Shreya Shankar
- 2. Haotian Li
- 3. Parth Asawa
- 4. Madelon Hulsebos
- 5. Yiming Lin
- 6. J.D. Zamfirescu-Pereira
- 7. Harrison Chase
- 8. Will Fu-Hinthorn
- 9. Aditya G. Parameswaran
- 10. Eugene Wu
Incoming Citations (Sorted by Pagerank)
Showing 6 of 6 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 1,963 | DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing | 2025 | VLDB | 9.929429e-05 |
| 9,729 | Semantic Integrity Constraints: Declarative Guardrails for AI-Augmented Data Processing Systems | 2025 | VLDB | 4.2942813e-05 |
| 9,985 | Making Prompts First-Class Citizens for Adaptive LLM Pipelines | 2026 | CIDR | 4.1945683e-05 |
| 10,432 | D-Bot: An LLM-Powered DBA Copilot | 2025 | SIGMOD | 4.1945683e-05 |
| 10,658 | LLMLog: Advanced Log Template Generation via LLM-driven Multi-Round Annotation | 2025 | VLDB | 4.1945683e-05 |
| 13,106 | Prompt Editor: A Taxonomy-driven System for Guided LLM Prompt Development in Enterprise Settings | 2025 | SIGMOD | - |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 8 of 8 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 254 | Snorkel: Rapid Training Data Creation with Weak Supervision | 2018 | VLDB | 0.00030540555 |
| 517 | Can Foundation Models Wrangle Your Data? | 2023 | VLDB | 0.00021169035 |
| 1,482 | Automating Large-Scale Data Quality Verification | 2018 | VLDB | 0.00011725533 |
| 3,840 | Revisiting Prompt Engineering via Declarative Crowdsourcing | 2024 | CIDR | 6.7106924e-05 |
| 3,995 | How Large Language Models Will Disrupt Data Management | 2023 | VLDB | 6.5513237e-05 |
| 4,003 | Data Platform for Machine Learning | 2019 | SIGMOD | 6.54347e-05 |
| 6,134 | Finding Label and Model Errors in Perception Data With Learned Observation Assertions | 2022 | SIGMOD | 5.1943414e-05 |
| 7,138 | Ease.ml/ci and Ease.ml/meter in Action: Towards Data Management for Statistical Generalization | 2019 | VLDB | 4.8216981e-05 |
Previous
Page 1 / 1
Next