Back to papers
Tuplex: Data Science in Python at Native Code Speed
Summary: Tuplex JIT-compiles Python UDFs into end-to-end native code for data pipelines. It uses a dual-mode execution model: a fast path for the common case with exception paths for failures, yielding up to 91x speedups over Spark/Dask and near hand-tuned C++ performance.
(summarized by gpt-5-nano on Feb 09 2026)
- Paper ID
- 6135
- Venue
- SIGMOD
- Year
- 2021
- Pagerank
- 0.0001021625
- Overall Rank
- 1,882 | 86.91%
- DOI
-
10.1145/3448016.3457244
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 19 of 19 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
| 5,476 |
Containerized Execution of UDFs: An Experimental Evaluation |
2022 |
VLDB |
5.4866534e-05 |
| 6,189 |
Accelerating Python UDFs in Vectorized Query Execution |
2022 |
CIDR |
5.1647573e-05 |
| 6,279 |
Self-Organizing Data Containers |
2022 |
CIDR |
5.1295282e-05 |
| 6,375 |
Dear User-Defined Functions, Inlining isn't working out so great for us. Let's try batching to make our relationship work. Sincerely, SQL |
2024 |
CIDR |
5.0923872e-05 |
| 6,378 |
Mitigating the Impedance Mismatch between Prediction Query Execution and Database Engine |
2025 |
SIGMOD |
5.0909804e-05 |
| 6,701 |
YeSQL: “You extend SQL” with Rich and Highly Performant User-Defined Functions in Relational Databases |
2022 |
VLDB |
4.9561066e-05 |
| 7,306 |
DAPHNE: An Open and Extensible System Infrastructure for Integrated Data Analysis Pipelines |
2022 |
CIDR |
4.7678574e-05 |
| 8,583 |
Efficient Execution of User-Defined Functions in SQL Queries |
2023 |
VLDB |
4.4919445e-05 |
| 8,645 |
Predicate Pushdown for Data Science Pipelines |
2023 |
SIGMOD |
4.4772518e-05 |
| 9,326 |
BladeDISC: Optimizing Dynamic Shape Machine Learning Workloads via Compiler Approach |
2023 |
SIGMOD |
4.3556432e-05 |
| 9,343 |
The Key to Effective UDF Optimization: Before Inlining, First Perform Outlining |
2025 |
VLDB |
4.3546206e-05 |
| 9,718 |
YeSQL: Rich User-Defined Functions without the Overhead |
2022 |
VLDB |
4.2980763e-05 |
| 9,763 |
The UDFBench Benchmark for General-purpose UDF Queries |
2025 |
VLDB |
4.2856106e-05 |
| 9,846 |
HyperBlocker: Accelerating Rule-based Blocking in Entity Resolution using GPUs |
2025 |
VLDB |
4.2721228e-05 |
| 10,459 |
UDFBench: A Tool for Benchmarking UDF Queries on SQL Engines |
2025 |
SIGMOD |
4.1945683e-05 |
| 10,471 |
Approximating Opaque Top-k Queries |
2025 |
SIGMOD |
4.1945683e-05 |
| 10,969 |
Query Compilation Without Regrets |
2024 |
SIGMOD |
4.1945683e-05 |
| 11,213 |
Udon: Efficient Debugging of User-Defined Functions in Big Data Systems with Line-by-Line Control |
2023 |
SIGMOD |
4.1945683e-05 |
| 11,288 |
To UDFs and Beyond: Demonstration of a Fully Decomposed Data Processor for General Data Wrangling Tasks |
2023 |
VLDB |
4.1945683e-05 |
Outgoing Citations (Sorted by Pagerank)
Showing 13 of 13 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 60 |
Efficiently Compiling Efficient Query Plans for Modern Hardware |
2011 |
VLDB |
0.00064439773 |
| 66 |
Spark SQL: Relational Data Processing in Spark |
2015 |
SIGMOD |
0.00061639801 |
| 704 |
Building Efficient Query Engines in a High-Level Language |
2014 |
VLDB |
0.00017900583 |
| 853 |
Everything You Always Wanted to Know About Compiled and Vectorized Queries But Were Afraid to Ask |
2018 |
VLDB |
0.00015940507 |
| 1,873 |
An Architecture for Compiling UDF-centric Workflows |
2015 |
VLDB |
0.00010253002 |
| 2,184 |
A Sample-and-Clean Framework for Fast and Accurate Query Processing on Dirty Data |
2014 |
SIGMOD |
9.3429789e-05 |
| 2,322 |
Instant Loading for Main Memory Databases |
2013 |
VLDB |
9.034874e-05 |
| 2,383 |
How to Architect a Query Compiler |
2016 |
SIGMOD |
8.9294108e-05 |
| 2,611 |
Opening the Black Boxes in Data Flow Optimization |
2012 |
VLDB |
8.4536967e-05 |
| 2,838 |
How to Architect a Query Compiler, Revisited |
2018 |
SIGMOD |
8.0408472e-05 |
| 2,896 |
Evaluating End-to-End Optimization for Data Analytics Applications in Weld |
2018 |
VLDB |
7.9452051e-05 |
| 4,410 |
DBToaster: A SQL Compiler for High-Performance Delta Processing in Main-Memory Databases |
2009 |
VLDB |
6.2091068e-05 |
| 6,384 |
A Demonstration of DBWipes: Clean as You Query |
2012 |
VLDB |
5.0880333e-05 |
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 11,782 |
The Best of Both Worlds: Big Data Programming with Both Productivity and Performance |
2017 |
SIGMOD |
4.1945683e-05 |
| 2,170 |
tf.data: A Machine Learning Data Processing Framework |
2021 |
VLDB |
9.3821603e-05 |
| 5,981 |
DataPrep.EDA: Task-Centric Exploratory Data Analysis for Statistical Modeling in Python |
2021 |
SIGMOD |
5.2448986e-05 |
| 2,418 |
Tupleware: "Big" Data, Big Analytics, Small Clusters |
2015 |
CIDR |
8.8556595e-05 |
| 1,750 |
Weld: A Common Runtime for High Performance Data Analytics |
2017 |
CIDR |
0.00010683647 |
| 2,954 |
Magpie: Python at Speed and Scale using Cloud Backends |
2021 |
CIDR |
7.8262582e-05 |
| 2,896 |
Evaluating End-to-End Optimization for Data Analytics Applications in Weld |
2018 |
VLDB |
7.9452051e-05 |
| 6,189 |
Accelerating Python UDFs in Vectorized Query Execution |
2022 |
CIDR |
5.1647573e-05 |
| 1,873 |
An Architecture for Compiling UDF-centric Workflows |
2015 |
VLDB |
0.00010253002 |
| 9,719 |
Tuplex: Robust, Efficient Analytics When Python Rules |
2019 |
VLDB |
4.2980763e-05 |