Database Paper Browser

Back to papers

Auto-Prep: Holistic Prediction of Data Preparation Steps for Self-Service Business Intelligence

Summary: Auto-Prep: a system that holistically predicts intertwined data-transformation and join steps in self-service BI workflows, learned from ~2K real BI projects. Uses a Steiner-tree–inspired graph algorithm with provable guarantees and achieves >70% accuracy, outperforming prior approaches and GPT‑4. (summarized by gpt-5-mini on Feb 09 2026)

Paper ID
13872
Venue
VLDB
Year
2025
Pagerank
4.1945683e-05
Overall Rank
10,598 | 26.28%
DOI
10.14778/3734839.3734856

Incoming Non-self Citations Over Time

No non-self incoming citations found for this paper in this database.

Authors

Incoming Citations (Sorted by Pagerank)

Showing 0 of 0 citing papers.

Rank Citing Paper Year Venue Pagerank
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 27 of 27 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
492 Query by Output 2009 SIGMOD 0.00021974699
517 Can Foundation Models Wrangle Your Data? 2023 VLDB 0.00021169035
1,187 JOSIE: Overlap Set Similarity Search for Finding Joinable Tables in Data Lakes 2019 SIGMOD 0.00013443639
1,267 Foofah: Transforming Data By Example 2017 SIGMOD 0.00012936483
1,337 HoloDetect: Few-Shot Learning for Error Detection 2019 SIGMOD 0.00012497164
1,469 BlinkFill: Semi-supervised Programming By Example for Syntactic String Transformations 2016 VLDB 0.00011836053
1,664 On Multi-Column Foreign Key Discovery 2010 VLDB 0.00010976887
1,894 Baran: Effective Error Correction via a Unified Context Representation and Transfer Learning 2020 VLDB 0.0001018378
2,158 Uni-Detect: A Unified Approach to Automated Error Detection in Tables 2019 SIGMOD 9.4141354e-05
2,295 OLAP and Statistical Databases: Similarities and Differences 1997 PODS 9.0782994e-05
2,506 Auto-Detect: Data-Driven Error Detection in Tables 2018 SIGMOD 8.6335464e-05
2,587 Table-GPT: Table Fine-tuned GPT for Diverse Table Tasks 2024 SIGMOD 8.4924618e-05
2,968 Raha: A Configuration-Free Error Detection System 2019 SIGMOD 7.7985097e-05
3,299 SCODED: Statistical Constraint Oriented Data Error Detection 2020 SIGMOD 7.2546659e-05
3,393 Lux: Always-on Visualization Recommendations for Exploratory Dataframe Workflows 2022 VLDB 7.1483239e-05
3,478 Transform-Data-by-Example (TDE): An Extensible Search Engine for Data Transformations 2018 VLDB 7.054159e-05
3,735 Auto-Join: Joining Tables by Leveraging Transformations 2017 VLDB 6.8061318e-05
4,850 SEMA-JOIN: Joining Semantically-Related Tables Using Big Table Corpora 2015 VLDB 5.8768452e-05
5,096 Auto-Transform: Learning-to-Transform by Patterns 2020 VLDB 5.7011825e-05
5,275 Auto-Tables: Synthesizing Multi-Step Transformations to Relationalize Tables without Using Examples 2023 VLDB 5.5905507e-05
5,383 Auto-Pipeline: Synthesizing Complex Data Pipelines By-Target Using Reinforcement Learning and Search 2021 VLDB 5.5393038e-05
5,434 Auto-FuzzyJoin: Auto-Program Fuzzy Similarity Joins Without Labeled Examples 2021 SIGMOD 5.5045402e-05
5,486 Fast Foreign-Key Detection in Microsoft SQL Server PowerPivot for Excel 2014 VLDB 5.4811603e-05
6,800 DTT: An Example-Driven Tabular Transformer for Joinability by Leveraging Large Language Models 2024 SIGMOD 4.9231471e-05
8,042 Transform-Data-by-Example (TDE): Extensible Data Transformation in Excel 2018 SIGMOD 4.5994569e-05
9,399 TabulaX: Leveraging Large Language Models for Multi-Class Table Transformations 2025 VLDB 4.3441378e-05
9,490 Auto-BI: Automatically Build BI-Models Leveraging Local Join Prediction and Global Schema Graph 2023 VLDB 4.3341665e-05
Previous Page 1 / 1 Next

Semantically Similar Papers