Complaint-Driven Training Data Debugging at Interactive Speeds
Summary: Rain++ enables complaint-driven debugging of training data for inference queries by ranking offending examples from complaints. Precomputation decouples cost from model size, enabling interactive ~1 ms latency for multi-million-parameter models and supporting standing/streaming queries. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Lampros Flokas
- 2. Weiyuan Wu
- 3. Yejia Liu
- 4. Jiannan Wang
- 5. Nakul Verma
- 6. Eugene Wu
Incoming Citations (Sorted by Pagerank)
Showing 2 of 2 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 5,313 | XInsight: eXplainable Data Analysis Through The Lens of Causality | 2023 | SIGMOD | 5.573009e-05 |
| 8,257 | Automating and Optimizing Data-Centric What-If Analyses on Native Machine Learning Pipelines | 2023 | SIGMOD | 4.5487511e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 11 of 11 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 31 | Provenance Semirings | 2007 | PODS | 0.0007857786 |
| 140 | The MADlib Analytics Library or MAD Skills, the SQL | 2012 | VLDB | 0.00042270404 |
| 214 | Scorpion: Explaining Away Outliers in Aggregate Queries | 2013 | VLDB | 0.0003363692 |
| 1,106 | Provenance for Aggregate Queries | 2011 | PODS | 0.0001398766 |
| 1,371 | Tiresias: The Database Oracle for How-To Queries | 2012 | SIGMOD | 0.00012323502 |
| 1,482 | Automating Large-Scale Data Quality Verification | 2018 | VLDB | 0.00011725533 |
| 2,154 | DIFF: A Relational Interface for Large-Scale Data Explanation | 2019 | VLDB | 9.4208667e-05 |
| 2,753 | Complaint-driven Training Data Debugging for Query 2.0 | 2020 | SIGMOD | 8.1724339e-05 |
| 3,105 | Data X-Ray: A Diagnostic Tool for Data Errors | 2015 | SIGMOD | 7.5568954e-05 |
| 4,424 | PrIU: A Provenance-Based Approach for Incrementally Updating Regression Models | 2020 | SIGMOD | 6.198474e-05 |
| 5,191 | Going Beyond Provenance: Explaining Query Answers with Pattern-based Counterbalances | 2019 | SIGMOD | 5.6378768e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 8,220 | PerfGuard: Deploying ML-for-Systems without Performance Regressions, Almost! | 2021 | VLDB | 4.5557328e-05 |
| 7,061 | Serving Deep Learning Models with Deduplication from Relational Databases | 2022 | VLDB | 4.8463881e-05 |
| 6,796 | InferDB: In-Database Machine Learning Inference Using Indexes | 2024 | VLDB | 4.9241624e-05 |
| 884 | Plan-Structured Deep Neural Network Models for Query Performance Prediction | 2019 | VLDB | 0.00015654004 |
| 608 | DeepDB: Learn from Data, not from Queries! | 2020 | VLDB | 0.00019235898 |
| 5,473 | Facilitating SQL Query Composition and Analysis | 2020 | SIGMOD | 5.4885366e-05 |
| 329 | Accelerating Machine Learning Inference with Probabilistic Predicates | 2018 | SIGMOD | 0.00027249545 |
| 7,989 | RCRank: Multimodal Ranking of Root Causes of Slow Queries in Cloud Database Systems | 2025 | VLDB | 4.6124681e-05 |
| 5,222 | Enabling SQL-based Training Data Debugging for Federated Learning | 2022 | VLDB | 5.6210545e-05 |
| 2,753 | Complaint-driven Training Data Debugging for Query 2.0 | 2020 | SIGMOD | 8.1724339e-05 |