Database Paper Browser

Back to papers

Weak-to-Strong Prompts with Lightweight-to-Powerful LLMs for High-Accuracy, Low-Cost, and Explainable Data Transformation

Summary: MegaTran uses a fine-tuned lightweight LLM to turn loose user prompts into structured task descriptions (Weak2StrongPrompt), then a powerful LLM generates transformation code (Prompt2Code). Sanity-check reflection with checklists and LazyRAG retrieval of external snippets make results explainable, low-cost, and 2.2–26.1% more accurate than prior methods. (summarized by gpt-5-mini on Feb 09 2026)

Paper ID
13885
Venue
VLDB
Year
2025
Pagerank
4.1945683e-05
Overall Rank
10,610 | 26.19%
DOI
10.14778/3742728.3742734

Incoming Non-self Citations Over Time

No non-self incoming citations found for this paper in this database.

Authors

Incoming Citations (Sorted by Pagerank)

Showing 1 of 1 citing papers.

Rank Citing Paper Year Venue Pagerank
10,289 LEAD: Iterative Data Selection for Efficient LLM Instruction Tuning 2026 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 20 of 20 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
112 Potter's Wheel: An Interactive Data Cleaning System 2001 VLDB 0.00047045036
517 Can Foundation Models Wrangle Your Data? 2023 VLDB 0.00021169035
1,012 NADEEF: A Commodity Data Cleaning System 2013 SIGMOD 0.0001464733
1,267 Foofah: Transforming Data By Example 2017 SIGMOD 0.00012936483
1,277 The Data Civilizer System 2017 CIDR 0.00012879695
1,469 BlinkFill: Semi-supervised Programming By Example for Syntactic String Transformations 2016 VLDB 0.00011836053
1,831 Synthesizing Entity Matching Rules by Examples 2018 VLDB 0.00010384082
2,349 RPT: Relational Pre-trained Transformer Is Almost All You Need towards Democratizing Data Preparation 2021 VLDB 8.9876423e-05
3,015 Chorus: Foundation Models for Unified Data Discovery and Exploration 2024 VLDB 7.7092391e-05
3,192 Towards Dependable Data Repairing with Fixing Rules 2014 SIGMOD 7.4095761e-05
3,478 Transform-Data-by-Example (TDE): An Extensible Search Engine for Data Transformations 2018 VLDB 7.054159e-05
4,212 Unicorn: A Unified Multi-tasking Model for Supporting Matching Tasks in Data Integration 2023 SIGMOD 6.3555142e-05
4,908 Combining Small Language Models and Large Language Models for Zero-Shot NL2SQL 2024 VLDB 5.8339245e-05
5,096 Auto-Transform: Learning-to-Transform by Patterns 2020 VLDB 5.7011825e-05
5,280 Explaining Dataset Changes for Semantic Data Versioning with Explain-Da-V 2023 VLDB 5.5896735e-05
5,981 DataPrep.EDA: Task-Centric Exploratory Data Analysis for Statistical Modeling in Python 2021 SIGMOD 5.2448986e-05
6,800 DTT: An Example-Driven Tabular Transformer for Joinability by Leveraging Large Language Models 2024 SIGMOD 4.9231471e-05
7,458 Trinity: An Extensible Synthesis Framework for Data Science 2019 VLDB 4.7245406e-05
8,000 Data Civilizer 2.0: A Holistic Framework for Data Preparation and Analytics 2019 VLDB 4.6092803e-05
9,577 CoClean: Collaborative Data Cleaning 2020 SIGMOD 4.3248438e-05
Previous Page 1 / 1 Next

Semantically Similar Papers