SMARTFEAT: Efficient Feature Construction through Feature-Level Foundation Model Interactions
Summary: SMARTFEAT uses foundation models to synthesize informative new features via feature-level FM interactions, guided by an intelligent operator selector to avoid exhaustive operator/feature combinations. A function generator emits efficient dataframe/lambda transformations (not per-row FM calls), enabling scalable, cost- and latency-efficient feature construction for large datasets. (summarized by gpt-5-mini on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Yin Lin
- 2. Bolin Ding
- 3. H. V. Jagadish
- 4. Jingren Zhou
Incoming Citations (Sorted by Pagerank)
Showing 2 of 2 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 9,476 | Adda: Towards Efficient in-Database Feature Generation via LLM-based Agents | 2025 | SIGMOD | 4.3341665e-05 |
| 10,628 | CatDB: Data-catalog-guided, LLM-based Generation of Data-centric ML Pipelines | 2025 | VLDB | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 4 of 4 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 221 | Deep Entity Matching with Pre-Trained Language Models | 2021 | VLDB | 0.00033121824 |
| 517 | Can Foundation Models Wrangle Your Data? | 2023 | VLDB | 0.00021169035 |
| 1,612 | Detecting Data Errors: Where are we and what needs to be done? | 2016 | VLDB | 0.00011142794 |
| 2,349 | RPT: Relational Pre-trained Transformer Is Almost All You Need towards Democratizing Data Preparation | 2021 | VLDB | 8.9876423e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 517 | Can Foundation Models Wrangle Your Data? | 2023 | VLDB | 0.00021169035 |
| 2,194 | Enabling and Optimizing Non-linear Feature Interactions in Factorized Linear Algebra | 2019 | SIGMOD | 9.3138337e-05 |
| 12,546 | SMART: A Tool for Semantic-Driven Creation of Complex XML Mappings | 2005 | SIGMOD | 4.1945683e-05 |
| 9,476 | Adda: Towards Efficient in-Database Feature Generation via LLM-based Agents | 2025 | SIGMOD | 4.3341665e-05 |
| 8,847 | Towards Foundation Database Models | 2025 | CIDR | 4.4371897e-05 |
| 6,347 | A Relational Framework for Classifier Engineering | 2017 | PODS | 5.1019568e-05 |
| 11,547 | CAFE: Constraint-Aware Feature Extraction from Large Databases | 2020 | CIDR | 4.1945683e-05 |
| 10,963 | FeatureLTE: Learning to Estimate Feature Importance | 2024 | SIGMOD | 4.1945683e-05 |
| 13,097 | DANTE: Hybrid AI System for Context-Aware Interpretable Feature Engineering | 2025 | SIGMOD | - |
| 6,115 | An Integrated Development Environment for Faster Feature Engineering | 2014 | VLDB | 5.2042468e-05 |