Database Paper Browser

Back to papers

Auto-Suggest: Learning-to-Recommend Data Preparation Steps Using Data Science Notebooks

Summary: Auto-Suggest learns to propose data prep steps by mining notebook-driven data manipulations. Crawls 4M GitHub Jupyter notebooks, replays steps to log inputs/outputs and decisions, using logs to learn data-driven prep recommendations that beat baselines. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
5953
Venue
SIGMOD
Year
2020
Pagerank
7.3178277e-05
Overall Rank
3,252 | 77.38%
DOI
10.1145/3318464.3389738

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 17 of 17 citing papers.

Rank Citing Paper Year Venue Pagerank
3,015 Chorus: Foundation Models for Unified Data Discovery and Exploration 2024 VLDB 7.7092391e-05
5,275 Auto-Tables: Synthesizing Multi-Step Transformations to Relationalize Tables without Using Examples 2023 VLDB 5.5905507e-05
5,280 Explaining Dataset Changes for Semantic Data Versioning with Explain-Da-V 2023 VLDB 5.5896735e-05
5,383 Auto-Pipeline: Synthesizing Complex Data Pipelines By-Target Using Reinforcement Learning and Search 2021 VLDB 5.5393038e-05
6,409 Fine-Grained Lineage for Safer Notebook Interactions 2021 VLDB 5.0756653e-05
6,895 Decentralized Actor Scheduling and Reference-based Storage in Xorbits: a Native Scalable Data Science Engine 2025 VLDB 4.8925595e-05
8,388 FEDEX: An Explainability Framework for Data Exploration Steps 2022 VLDB 4.5297787e-05
8,645 Predicate Pushdown for Data Science Pipelines 2023 SIGMOD 4.4772518e-05
8,828 HAIPipe: Combining Human-generated and Machine-generated Pipelines for Data Preparation 2023 SIGMOD 4.4407488e-05
9,371 Auto-Formula: Recommend Formulas in Spreadsheets using Contrastive Learning for Table Representations 2024 SIGMOD 4.3480692e-05
9,490 Auto-BI: Automatically Build BI-Models Leveraging Local Join Prediction and Global Schema Graph 2023 VLDB 4.3341665e-05
10,152 Data-Semantics-Aware Recommendation of Diverse Pivot Tables 2026 SIGMOD 4.1945683e-05
10,168 FlowPilot: A Suggestion System for Designing Scientific Workflows 2026 SIGMOD 4.1945683e-05
11,063 Searching Data Lakes for Nested and Joined Data 2024 VLDB 4.1945683e-05
11,103 LucidScript: Bottom-up Standardization for Data Preparation 2024 VLDB 4.1945683e-05
11,297 DataRinse: Semantic Transforms for Data preparation based on Code Mining 2023 VLDB 4.1945683e-05
11,429 Leam: An Interactive System for In-situ Visual Text Analysis 2021 CIDR 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 25 of 25 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
98 XMark: A Benchmark for XML Data Management 2002 VLDB 0.00050023808
475 Mining Database Structure; Or, How to Build a Data Quality Browser 2002 SIGMOD 0.00022303253
600 Linear Road: A Stream Data Management Benchmark 2004 VLDB 0.0001938744
1,009 SnipSuggest: Context-Aware Autocompletion for SQL 2011 VLDB 0.00014653644
1,267 Foofah: Transforming Data By Example 2017 SIGMOD 0.00012936483
1,277 The Data Civilizer System 2017 CIDR 0.00012879695
1,317 Harvesting Relational Tables from Lists on the Web 2009 VLDB 0.00012625853
1,337 HoloDetect: Few-Shot Learning for Error Detection 2019 SIGMOD 0.00012497164
1,469 BlinkFill: Semi-supervised Programming By Example for Syntactic String Transformations 2016 VLDB 0.00011836053
1,482 Automating Large-Scale Data Quality Verification 2018 VLDB 0.00011725533
1,612 Detecting Data Errors: Where are we and what needs to be done? 2016 VLDB 0.00011142794
1,664 On Multi-Column Foreign Key Discovery 2010 VLDB 0.00010976887
2,097 Predictive Interaction for Data Transformation 2015 CIDR 9.5489822e-05
2,158 Uni-Detect: A Unified Approach to Automated Error Detection in Tables 2019 SIGMOD 9.4141354e-05
2,506 Auto-Detect: Data-Driven Error Detection in Tables 2018 SIGMOD 8.6335464e-05
3,299 SCODED: Statistical Constraint Oriented Data Error Detection 2020 SIGMOD 7.2546659e-05
3,478 Transform-Data-by-Example (TDE): An Extensible Search Engine for Data Transformations 2018 VLDB 7.054159e-05
3,690 Navigating the Data Lake with DATAMARAN: Automatically Extracting Structure from Log Datasets 2018 SIGMOD 6.8384476e-05
3,735 Auto-Join: Joining Tables by Leveraging Transformations 2017 VLDB 6.8061318e-05
3,742 TEGRA: Table Extraction by Global Record Alignment 2015 SIGMOD 6.7966898e-05
4,850 SEMA-JOIN: Joining Semantically-Related Tables Using Big Table Corpora 2015 VLDB 5.8768452e-05
5,486 Fast Foreign-Key Detection in Microsoft SQL Server PowerPivot for Excel 2014 VLDB 5.4811603e-05
6,195 WADaR: Joint Wrapper and Data Repair 2015 VLDB 5.1618114e-05
6,697 The TEXTURE Benchmark: Measuring Performance of Text Queries on a Relational DBMS 2005 VLDB 4.9577992e-05
8,499 Synthesizing Mapping Relationships Using Table Corpus 2017 SIGMOD 4.4975851e-05
Previous Page 1 / 1 Next

Semantically Similar Papers