Back to papers
Brainwash: A Data System for Feature Engineering
Summary: Brainwash: a data-system vision to ease feature engineering for large ML-driven systems by shortening the Explore–Extract–Evaluate loop and revealing how feature code interacts with massive datasets. Focuses on faster iterative feedback and run reuse.
(summarized by gpt-5-mini on Feb 09 2026)
- Paper ID
- 212
- Venue
- CIDR
- Year
- 2013
- Pagerank
- 7.9078385e-05
- Overall Rank
- 2,915 | 79.73%
- DOI
-
-
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 23 of 23 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
| 609 |
Monkey: Optimal Navigable Key-Value Store |
2017 |
SIGMOD |
0.0001923446 |
| 903 |
To Join or Not to Join? Thinking Twice about Joins before Feature Selection |
2016 |
SIGMOD |
0.0001547016 |
| 1,167 |
Learning Generalized Linear Models Over Normalized Data |
2015 |
SIGMOD |
0.00013547713 |
| 1,532 |
Data Management in Machine Learning: Challenges, Techniques, and Systems |
2017 |
SIGMOD |
0.00011472681 |
| 1,666 |
HELIX: Holistic Optimization for Accelerating Iterative Machine Learning |
2019 |
VLDB |
0.0001096361 |
| 2,157 |
The Data Calculator*: Data Structure Design and Cost Synthesis from First Principles and Learned Cost Models |
2018 |
SIGMOD |
9.416022e-05 |
| 4,106 |
Extracting Databases from Dark Data with DeepDive |
2016 |
SIGMOD |
6.4456184e-05 |
| 4,129 |
Are Key-Foreign Key Joins Safe to Avoid when Learning High-Capacity Classifiers? |
2018 |
VLDB |
6.428887e-05 |
| 4,785 |
Demonstration of Santoku: Optimizing Machine Learning over Normalized Data |
2015 |
VLDB |
5.9236989e-05 |
| 5,308 |
Key-Value Storage Engines |
2020 |
SIGMOD |
5.576303e-05 |
| 5,806 |
BlinkML: Efficient Maximum Likelihood Estimation with Probabilistic Guarantees |
2019 |
SIGMOD |
5.3200643e-05 |
| 6,115 |
An Integrated Development Environment for Faster Feature Engineering |
2014 |
VLDB |
5.2042468e-05 |
| 6,347 |
A Relational Framework for Classifier Engineering |
2017 |
PODS |
5.1019568e-05 |
| 6,456 |
From Auto-tuning One Size Fits All to Self-designed and Learned Data-intensive Systems |
2019 |
SIGMOD |
5.0564619e-05 |
| 7,664 |
Schema Independent Relational Learning |
2017 |
SIGMOD |
4.6857329e-05 |
| 8,864 |
Cerebro: A Layered Data Platform for Scalable Deep Learning |
2021 |
CIDR |
4.4326439e-05 |
| 9,382 |
Hephaestus: Data Reuse for Accelerating Scientific Discovery |
2015 |
CIDR |
4.3457368e-05 |
| 10,177 |
InferF: Declarative Factorization of AI/ML Inferences over Joins |
2026 |
SIGMOD |
4.1945683e-05 |
| 11,476 |
Enforcing Constraints for Machine Learning Systems via Declarative Feature Selection: An Experimental Study |
2021 |
SIGMOD |
4.1945683e-05 |
| 11,975 |
Which Concepts Are Worth Extracting? |
2014 |
SIGMOD |
4.1945683e-05 |
| 12,020 |
The Case for Personal Data-Driven Decision Making |
2014 |
VLDB |
4.1945683e-05 |
| 13,360 |
Faster Evaluation of Labor-Intensive Features |
2015 |
CIDR |
- |
| 13,448 |
Ringtail: A Generalized Nowcasting System |
2013 |
VLDB |
- |
Outgoing Citations (Sorted by Pagerank)
Showing 0 of 0 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
Semantically Similar Papers