Back to papers
The Fast and the Private: Task-based Dataset Search
Summary: Mileena uses pre-computed semi-ring sketches to rapidly evaluate joins/unions that augment a requester's dataset for task-driven ML, enabling low-latency training and evaluation over large corpora. Introduces a Factorized Privacy Mechanism for scalable differential privacy with minimal utility loss, and integrates LLM-based transformation agents plus semi-ring extensions for causal discovery and treatment-effect estimation.
(summarized by gpt-5-mini on Feb 09 2026)
- Paper ID
- 519
- Venue
- CIDR
- Year
- 2024
- Pagerank
- 5.2229324e-05
- Overall Rank
- 6,077 | 57.73%
- DOI
-
-
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 4 of 4 citing papers.
Outgoing Citations (Sorted by Pagerank)
Showing 16 of 16 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 453 |
Towards Practical Differential Privacy for SQL Queries |
2018 |
VLDB |
0.00022741848 |
| 517 |
Can Foundation Models Wrangle Your Data? |
2023 |
VLDB |
0.00021169035 |
| 583 |
FAQ: Questions Asked Frequently |
2016 |
PODS |
0.00019717214 |
| 834 |
Learning Linear Regression Models over Factorized Joins |
2016 |
SIGMOD |
0.00016135159 |
| 1,337 |
HoloDetect: Few-Shot Learning for Error Detection |
2019 |
SIGMOD |
0.00012497164 |
| 1,449 |
Causal Relational Learning |
2020 |
SIGMOD |
0.0001193267 |
| 1,463 |
ARDA: Automatic Relational Data Augmentation for Machine Learning |
2020 |
VLDB |
0.00011869295 |
| 1,751 |
Auctus: A Dataset Search Engine for Data Discovery and Augmentation |
2021 |
VLDB |
0.00010683295 |
| 2,899 |
Privacy at Scale: Local Differential Privacy in Practice |
2018 |
SIGMOD |
7.9443198e-05 |
| 3,750 |
Data Acquisition for Improving Machine Learning Models |
2021 |
VLDB |
6.7895763e-05 |
| 5,267 |
Practical Differential Privacy via Grouping and Smoothing |
2013 |
VLDB |
5.5972313e-05 |
| 5,976 |
Responsible Data Integration: Next-generation Challenges |
2022 |
SIGMOD |
5.245976e-05 |
| 6,449 |
Causal Data Integration |
2023 |
VLDB |
5.0587746e-05 |
| 7,401 |
Frequency Estimation Under Multiparty Differential Privacy: One-shot and Streaming |
2022 |
VLDB |
4.7397228e-05 |
| 7,491 |
Saibot: A Differentially Private Data Search Platform |
2023 |
VLDB |
4.7180617e-05 |
| 7,920 |
JoinBoost: Grow Trees Over Normalized Data Using Only SQL |
2023 |
VLDB |
4.6163888e-05 |
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 11,251 |
Fast Search-By-Classification for Large-Scale Databases Using Index-Aware Decision Trees and Random Forests |
2023 |
VLDB |
4.1945683e-05 |
| 9,351 |
On Efficient Approximate Queries over Machine Learning Models |
2023 |
VLDB |
4.3524472e-05 |
| 495 |
Milvus: A Purpose-Built Vector Data Management System |
2021 |
SIGMOD |
0.00021767688 |
| 13,109 |
SemExplorer: A User Interface for Semantic Approach to Customized Dataset Search |
2025 |
SIGMOD |
- |
| 10,341 |
A Theoretical Framework for Distribution-Aware Dataset Search |
2025 |
PODS |
4.1945683e-05 |
| 10,439 |
Finding What You’re Looking For: A Distribution-Aware Dataset Search Engine in Action |
2025 |
SIGMOD |
4.1945683e-05 |
| 10,329 |
Revisiting Task-Oriented Dataset Search in the Era of Large Language Models: Challenges, Benchmark, and Solution |
2026 |
VLDB |
4.1945683e-05 |
| 10,499 |
Privacy and Accuracy-Aware AI/ML Model Deduplication |
2025 |
SIGMOD |
4.1945683e-05 |
| 4,003 |
Data Platform for Machine Learning |
2019 |
SIGMOD |
6.54347e-05 |
| 7,491 |
Saibot: A Differentially Private Data Search Platform |
2023 |
VLDB |
4.7180617e-05 |