HAIPipe: Combining Human-generated and Machine-generated Pipelines for Data Preparation
Summary: HAIPipe fuses HI-pipelines with AI-pipelines to form HAI-pipelines that outperform either. It uses an enumeration-sampling framework and RL-guided AI-pipeline search, with experiments on 1400+ real-world HI-pipelines showing gains. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Sibei Chen
- 2. Nan Tang
- 3. Ju Fan
- 4. Xuemi Yan
- 5. Chengliang Chai
- 6. Guoliang Li
- 7. Xiaoyong Du
Incoming Citations (Sorted by Pagerank)
Showing 5 of 5 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 6,765 | Automatic Database Configuration Debugging using Retrieval-Augmented Language Models | 2025 | SIGMOD | 4.9325583e-05 |
| 7,931 | In-depth Analysis of Graph-based RAG in a Unified Framework | 2025 | VLDB | 4.613363e-05 |
| 8,743 | CtxPipe: Context-aware Data Preparation Pipeline Construction for Machine Learning | 2024 | SIGMOD | 4.456315e-05 |
| 9,371 | Auto-Formula: Recommend Formulas in Spreadsheets using Contrastive Learning for Table Representations | 2024 | SIGMOD | 4.3480692e-05 |
| 10,682 | AutoPrep: Natural Language Question-Aware Data Preparation with a Multi-Agent Framework | 2025 | VLDB | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 7 of 7 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 921 | Democratizing Data Science through Interactive Curation of ML Pipelines | 2019 | SIGMOD | 0.00015337438 |
| 1,391 | Ease.ml: Towards Multi-tenant Resource Sharing for Machine Learning Workloads | 2018 | VLDB | 0.0001223506 |
| 1,993 | Automatically Generating Data Exploration Sessions Using Deep Reinforcement Learning | 2020 | SIGMOD | 9.8453334e-05 |
| 3,252 | Auto-Suggest: Learning-to-Recommend Data Preparation Steps Using Data Science Notebooks | 2020 | SIGMOD | 7.3178277e-05 |
| 5,383 | Auto-Pipeline: Synthesizing Complex Data Pipelines By-Target Using Reinforcement Learning and Search | 2021 | VLDB | 5.5393038e-05 |
| 6,569 | Domain Adaptation for Deep Entity Resolution | 2022 | SIGMOD | 5.0065379e-05 |
| 8,406 | DADER: Hands-Off Entity Resolution with Domain Adaptation | 2022 | VLDB | 4.5220083e-05 |
Previous
Page 1 / 1
Next