Database Paper Browser

Back to papers

RPT: Relational Pre-trained Transformer Is Almost All You Need towards Democratizing Data Preparation

Summary: RPT: denoising tuple-to-tuple autoencoder; Transformer encoder-decoder unifies BERT and GPT. Pre-trained, it enables data cleaning, auto-completion, and normalization and annotation, plus few-shot and collaborative ER/IE. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
12315
Venue
VLDB
Year
2021
Pagerank
8.9876423e-05
Overall Rank
2,349 | 83.66%
DOI
10.14778/3457390.3457391

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 27 of 27 citing papers.

Rank Citing Paper Year Venue Pagerank
1,541 Symphony: Towards Natural Language Query Answering over Multi-modal Data Lakes 2023 CIDR 0.00011456579
1,643 CodexDB: Synthesizing Code for Query Processing from Natural Language Instructions using GPT-3 Codex 2022 VLDB 0.0001104256
2,517 Annotating Columns with Pre-trained Language Models 2022 SIGMOD 8.6092139e-05
3,335 DeepJoin: Joinable Table Discovery with Pre-trained Language Models 2023 VLDB 7.2065006e-05
4,934 From BERT to GPT-3 Codex: Harnessing the Potential of Very Large Language Models for Data Management 2022 VLDB 5.8198826e-05
5,449 Transformers for Tabular Data Representation: A Tutorial on Models and Applications 2022 VLDB 5.5008652e-05
5,509 Can Large Language Models Predict Data Correlations from Column Names? 2023 VLDB 5.4703368e-05
6,092 Observatory: Characterizing Embeddings of Relational Tables 2024 VLDB 5.2138566e-05
6,280 Self-supervised and Interpretable Data Cleaning with Sequence Generative Adversarial Networks 2023 VLDB 5.1290457e-05
6,389 Chat2Data: An Interactive Data Analysis System with RAG, Vector Databases and LLMs 2024 VLDB 5.0844009e-05
6,553 How do Categorical Duplicates Affect ML? A New Benchmark and Empirical Analyses 2024 VLDB 5.0157344e-05
6,569 Domain Adaptation for Deep Entity Resolution 2022 SIGMOD 5.0065379e-05
6,737 Demonstrating GPT-DB: Generating Query-Specific and Customizable Code for SQL Processing with GPT-4 2023 VLDB 4.9457488e-05
6,800 DTT: An Example-Driven Tabular Transformer for Joinability by Leveraging Large Language Models 2024 SIGMOD 4.9231471e-05
7,020 LLM for Data Management 2024 VLDB 4.8595728e-05
8,052 Generating Succinct Descriptions of Database Schemata for Cost-Efficient Prompting of Large Language Models 2024 VLDB 4.5953106e-05
8,186 E2ETune: End-to-End Knob Tuning via Fine-tuned Generative Language Model 2025 VLDB 4.5651684e-05
8,208 SMARTFEAT: Efficient Feature Construction through Feature-Level Foundation Model Interactions 2024 CIDR 4.5581306e-05
8,523 Controllable Tabular Data Synthesis Using Diffusion Models 2024 SIGMOD 4.4937074e-05
8,743 CtxPipe: Context-aware Data Preparation Pipeline Construction for Machine Learning 2024 SIGMOD 4.456315e-05
9,348 GIDCL: A Graph-Enhanced Interpretable Data Cleaning Framework with Large Language Models 2024 SIGMOD 4.3526427e-05
9,399 TabulaX: Leveraging Large Language Models for Multi-Class Table Transformations 2025 VLDB 4.3441378e-05
9,476 Adda: Towards Efficient in-Database Feature Generation via LLM-based Agents 2025 SIGMOD 4.3341665e-05
9,479 Data Imputation with Limited Data Redundancy Using Data Lakes 2025 VLDB 4.3341665e-05
9,777 Data Augmentation for ML-driven Data Preparation and Integration 2021 VLDB 4.2856106e-05
10,610 Weak-to-Strong Prompts with Lightweight-to-Powerful LLMs for High-Accuracy, Low-Cost, and Explainable Data Transformation 2025 VLDB 4.1945683e-05
11,347 OpenTFV: An Open Domain Table-Based Fact Verification System 2022 SIGMOD 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 24 of 24 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
192 HoloClean: Holistic Data Repairs with Probabilistic Inference 2017 VLDB 0.00035728858
221 Deep Entity Matching with Pre-Trained Language Models 2021 VLDB 0.00033121824
265 A Cost-Based Model and Effective Heuristic for Repairing Constraints by Value Modification 2005 SIGMOD 0.00029763412
300 Deep Learning for Entity Matching: A Design Space Exploration 2018 SIGMOD 0.00028441466
513 TURL: Table Understanding through Representation Learning 2021 VLDB 0.00021288342
555 Discovering Denial Constraints 2013 VLDB 0.00020254908
712 Magellan: Toward Building Entity Matching Management Systems 2016 VLDB 0.00017732426
754 Distributed Representations of Tuples for Entity Resolution 2018 VLDB 0.00017117211
833 Guided Data Repair 2011 VLDB 0.00016138432
881 Don’t be SCAREd: Use SCalable Automatic REpairing with Maximal Likelihood and Bounded Changes 2013 SIGMOD 0.00015661103
1,159 Towards Certain Fixes with Editing Rules and Master Data 2010 VLDB 0.00013592813
1,277 The Data Civilizer System 2017 CIDR 0.00012879695
1,546 KATARA: A Data Cleaning System Powered by Knowledge Bases and Crowdsourcing 2015 SIGMOD 0.00011446851
1,612 Detecting Data Errors: Where are we and what needs to be done? 2016 VLDB 0.00011142794
1,894 Baran: Effective Error Correction via a Unified Context Representation and Transfer Learning 2020 VLDB 0.0001018378
1,914 Creating Embeddings of Heterogeneous Relational Datasets for Data Integration Tasks 2020 SIGMOD 0.00010109102
2,209 Data Integration: After the Teenage Years 2017 PODS 9.2868035e-05
2,460 Combining Quantitative and Logical Data Cleaning 2016 VLDB 8.7617484e-05
2,968 Raha: A Configuration-Free Error Detection System 2019 SIGMOD 7.7985097e-05
3,140 ZeroER: Entity Resolution using Zero Labeled Examples 2020 SIGMOD 7.4841763e-05
3,192 Towards Dependable Data Repairing with Fixing Rules 2014 SIGMOD 7.4095761e-05
5,192 Pattern Functional Dependencies for Data Cleaning 2020 VLDB 5.6375087e-05
6,534 Automatic Rule Refinement for Information Extraction 2010 VLDB 5.0244622e-05
8,517 Understanding Workers, Developing Effective Tasks, and Enhancing Marketplace Dynamics: A Study of a Large Crowdsourcing Marketplace 2017 VLDB 4.4943871e-05
Previous Page 1 / 1 Next

Semantically Similar Papers