Database Paper Browser

Back to papers

CtxPipe: Context-aware Data Preparation Pipeline Construction for Machine Learning

Summary: CtxPipe automates context-aware data-prep pipeline construction for ML using pretrained embeddings to capture semantics and guide component choice. A deep RL framework searches the pipeline, delivering higher feature quality and faster models. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
7000
Venue
SIGMOD
Year
2024
Pagerank
4.456315e-05
Overall Rank
8,743 | 39.18%
DOI
10.1145/3698831

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 1 of 1 citing papers.

Rank Citing Paper Year Venue Pagerank
10,168 FlowPilot: A Suggestion System for Designing Scientific Workflows 2026 SIGMOD 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 24 of 24 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
513 TURL: Table Understanding through Representation Learning 2021 VLDB 0.00021288342
791 ActiveClean: Interactive Data Cleaning For Statistical Modeling 2016 VLDB 0.00016629664
921 Democratizing Data Science through Interactive Curation of ML Pipelines 2019 SIGMOD 0.00015337438
1,047 Functional Dependency Discovery: An Experimental Evaluation of Seven Algorithms 2015 VLDB 0.00014459715
1,612 Detecting Data Errors: Where are we and what needs to be done? 2016 VLDB 0.00011142794
1,627 Data Cleaning: Overview and Emerging Challenges 2016 SIGMOD 0.00011086905
2,122 SystemDS: A Declarative Machine Learning System for the End-to-End Data Science Lifecycle 2020 CIDR 9.4989076e-05
2,253 Efficient Denial Constraint Discovery with Hydra 2018 VLDB 9.1937209e-05
2,302 Nearest Neighbor Classifiers over Incomplete Information: From Certain Answers to Certain Predictions 2021 VLDB 9.0668832e-05
2,349 RPT: Relational Pre-trained Transformer Is Almost All You Need towards Democratizing Data Preparation 2021 VLDB 8.9876423e-05
2,456 Production Machine Learning Pipelines: Empirical Analysis and Optimization Opportunities 2021 SIGMOD 8.7733773e-05
3,105 Data X-Ray: A Diagnostic Tool for Data Errors 2015 SIGMOD 7.5568954e-05
3,440 Approximate Denial Constraints 2020 VLDB 7.0918817e-05
3,467 Data Profiling – A Tutorial 2017 SIGMOD 7.069081e-05
4,682 Scalable Discovery of Unique Column Combinations 2014 VLDB 6.0022412e-05
5,192 Pattern Functional Dependencies for Data Cleaning 2020 VLDB 5.6375087e-05
5,429 DiffPrep: Differentiable Data Preprocessing Pipeline Search for Learning over Tabular Data 2023 SIGMOD 5.5087325e-05
6,437 Fundamentals of Order Dependencies 2012 VLDB 5.0631488e-05
6,944 DataPrism: Exposing Disconnect between Data and Systems 2022 SIGMOD 4.8912787e-05
7,202 Conformance Constraint Discovery: Measuring Trust in Data-Driven Systems 2021 SIGMOD 4.8023314e-05
7,719 WindTunnel: Towards Differentiable ML Pipelines Beyond a Single Model 2022 VLDB 4.6686188e-05
8,092 Saga: A Scalable Framework for Optimizing Data Cleaning Pipelines for Machine Learning Applications 2023 SIGMOD 4.587921e-05
8,341 BugDoc: Algorithms to Debug Computational Processes 2020 SIGMOD 4.5433282e-05
8,828 HAIPipe: Combining Human-generated and Machine-generated Pipelines for Data Preparation 2023 SIGMOD 4.4407488e-05
Previous Page 1 / 1 Next

Semantically Similar Papers