Back to papers
Demonstrating CatDB: LLM-based Generation of Data-centric ML Pipelines
Summary: CatDB enables LLM-based data-centric ML pipelines using dataset-specific instructions from profiling metadata and a refined data catalog. Instruction-splitting (clean, transform, train) yields massive speedups over state-of-the-art LLM/AutoML on large datasets.
(summarized by gpt-5-nano on Feb 09 2026)
- Paper ID
- 7146
- Venue
- SIGMOD
- Year
- 2025
- Pagerank
- -
- Overall Rank
- 13,098 | 8.88%
- DOI
-
10.1145/3722212.3725097
Incoming Non-self Citations Over Time
No non-self incoming citations found for this paper in this database.
Incoming Citations (Sorted by Pagerank)
Showing 0 of 0 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
Outgoing Citations (Sorted by Pagerank)
Showing 0 of 0 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 8,736 |
Unveiling Challenges for LLMs in Enterprise Data Engineering |
2026 |
VLDB |
4.456315e-05 |
| 10,142 |
AutoDDG: Automated Dataset Description Generation using Large Language Models |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,452 |
ScaleLLM: A Technique for Scalable LLM-augmented Data Systems |
2025 |
SIGMOD |
4.1945683e-05 |
| 6,389 |
Chat2Data: An Interactive Data Analysis System with RAG, Vector Databases and LLMs |
2024 |
VLDB |
5.0844009e-05 |
| 8,743 |
CtxPipe: Context-aware Data Preparation Pipeline Construction for Machine Learning |
2024 |
SIGMOD |
4.456315e-05 |
| 9,243 |
Demonstration of DB-GPT: Next Generation Data Interaction System Empowered by Large Language Models |
2024 |
VLDB |
4.3690661e-05 |
| 7,020 |
LLM for Data Management |
2024 |
VLDB |
4.8595728e-05 |
| 8,974 |
DataLoom: Simplifying Data Loading with LLMs |
2024 |
VLDB |
4.4184286e-05 |
| 10,316 |
LLM-AutoDP: Automatic Data Processing via LLM Agents for Model Fine-tuning |
2026 |
VLDB |
4.1945683e-05 |
| 10,628 |
CatDB: Data-catalog-guided, LLM-based Generation of Data-centric ML Pipelines |
2025 |
VLDB |
4.1945683e-05 |