Database Paper Browser

Back to papers

In-Database Data Imputation

Summary: In-database data imputation with MICE, using computation sharing and a ring abstraction to speed training. In-db learning of stochastic linear regression and Gaussian discriminant analysis for continuous/categorical imputation; PostgreSQL and DuckDB beat prior MICE and model-based methods by up to two orders of magnitude, preserving relationships. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
6879
Venue
SIGMOD
Year
2024
Pagerank
4.269353e-05
Overall Rank
9,856 | 31.44%
DOI
10.1145/3639326

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 1 of 1 citing papers.

Rank Citing Paper Year Venue Pagerank
10,617 Deduplicated Sampling On-Demand 2025 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 35 of 35 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
140 The MADlib Analytics Library or MAD Skills, the SQL 2012 VLDB 0.00042270404
192 HoloClean: Holistic Data Repairs with Probabilistic Inference 2017 VLDB 0.00035728858
557 SystemML: Declarative Machine Learning on Spark 2016 VLDB 0.00020197988
658 Towards a Unified Architecture for in-RDBMS Analytics 2012 SIGMOD 0.00018506577
834 Learning Linear Regression Models over Factorized Joins 2016 SIGMOD 0.00016135159
1,167 Learning Generalized Linear Models Over Normalized Data 2015 SIGMOD 0.00013547713
1,238 Incremental Query Evaluation in a Ring of Databases 2010 PODS 0.00013114581
1,279 Towards Linear Algebra over Normalized Data 2017 VLDB 0.00012868394
1,404 Responsible Data Management 2020 VLDB 0.00012174977
1,482 Automating Large-Scale Data Quality Verification 2018 VLDB 0.00011725533
1,532 Data Management in Machine Learning: Challenges, Techniques, and Systems 2017 SIGMOD 0.00011472681
1,612 Detecting Data Errors: Where are we and what needs to be done? 2016 VLDB 0.00011142794
1,627 Data Cleaning: Overview and Emerging Challenges 2016 SIGMOD 0.00011086905
1,894 Baran: Effective Error Correction via a Unified Context Representation and Transfer Learning 2020 VLDB 0.0001018378
2,276 Mind the Gap: An Experimental Evaluation of Imputation of Missing Values Techniques in Time Series 2020 VLDB 9.1261944e-05
2,302 Nearest Neighbor Classifiers over Incomplete Information: From Certain Answers to Certain Predictions 2021 VLDB 9.0668832e-05
2,573 Query Optimization for Dynamic Imputation 2017 VLDB 8.518235e-05
2,574 Discovery of Genuine Functional Dependencies from Relational Data with Missing Values 2018 VLDB 8.5173637e-05
3,277 A Layered Aggregate Engine for Analytics Workloads 2019 SIGMOD 7.2871625e-05
3,311 Efficient and Effective Data Imputation with Influence Functions 2022 VLDB 7.2406486e-05
3,825 Cleanits: A Data Cleaning System for Industrial Time Series 2019 VLDB 6.7255837e-05
4,197 Incremental View Maintenance with Triple Lock Factorization Benefits 2018 SIGMOD 6.367895e-05
4,332 Missing Value Imputation on Multidimensional Time Series 2021 VLDB 6.2805243e-05
4,613 F-IVM: Learning over Fast-Evolving Relational Data 2020 SIGMOD 6.0478676e-05
5,028 Adaptive Data Augmentation for Supervised Learning over Missing Data 2021 VLDB 5.7506746e-05
5,153 Horizon: Scalable Dependency-driven Data Cleaning 2021 VLDB 5.6607963e-05
5,388 Troubles with Nulls, Views from the Users 2022 VLDB 5.5373113e-05
6,280 Self-supervised and Interpretable Data Cleaning with Sequence Generative Adversarial Networks 2023 VLDB 5.1290457e-05
6,727 ORBITS: Online Recovery of Missing Values in Multiple Time Series Streams 2021 VLDB 4.9483604e-05
7,634 ReStore - Neural Data Completion for Relational Databases 2021 SIGMOD 4.6911382e-05
7,867 Learning Over Dirty Data Without Cleaning 2020 SIGMOD 4.6320452e-05
7,920 JoinBoost: Grow Trees Over Normalized Data Using Only SQL 2023 VLDB 4.6163888e-05
8,005 Online Topic-Aware Entity Resolution Over Incomplete Data Streams 2021 SIGMOD 4.6081461e-05
8,138 Fast and Reliable Missing Data Contingency Analysis with Predicate-Constraints 2020 SIGMOD 4.5771031e-05
9,577 CoClean: Collaborative Data Cleaning 2020 SIGMOD 4.3248438e-05
Previous Page 1 / 1 Next

Semantically Similar Papers