Back to papers
Annotating Columns with Pre-trained Language Models
Summary: Doduo, a multi-task framework using pre-trained language models, annotates table columns and their relations from the table. Achieves SOTA on two benchmarks for column types and relations, with up to 4.0% and 11.9% gains, using 8 tokens per column.
(summarized by gpt-5-nano on Feb 09 2026)
- Paper ID
- 6358
- Venue
- SIGMOD
- Year
- 2022
- Pagerank
- 8.6092139e-05
- Overall Rank
- 2,517 | 82.50%
- DOI
-
10.1145/3514221.3517906
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 23 of 23 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
| 2,587 |
Table-GPT: Table Fine-tuned GPT for Diverse Table Tasks |
2024 |
SIGMOD |
8.4924618e-05 |
| 2,836 |
Semantics-aware Dataset Discovery from Data Lakes with Contextualized Column-based Representation Learning |
2023 |
VLDB |
8.0443826e-05 |
| 3,000 |
SANTOS: Relationship-based Semantic Table Union Search |
2023 |
SIGMOD |
7.7462128e-05 |
| 3,015 |
Chorus: Foundation Models for Unified Data Discovery and Exploration |
2024 |
VLDB |
7.7092391e-05 |
| 3,335 |
DeepJoin: Joinable Table Discovery with Pre-trained Language Models |
2023 |
VLDB |
7.2065006e-05 |
| 3,876 |
The Design of an LLM-powered Unstructured Analytics System |
2025 |
CIDR |
6.6741456e-05 |
| 4,859 |
Integrating Data Lake Tables |
2023 |
VLDB |
5.8732433e-05 |
| 5,023 |
GenRewrite: Query Rewriting via Large Language Models |
2026 |
SIGMOD |
5.75363e-05 |
| 5,099 |
ArcheType: A Novel Framework for Open-Source Column Type Annotation using Large Language Models |
2024 |
VLDB |
5.6997784e-05 |
| 5,449 |
Transformers for Tabular Data Representation: A Tutorial on Models and Applications |
2022 |
VLDB |
5.5008652e-05 |
| 6,092 |
Observatory: Characterizing Embeddings of Relational Tables |
2024 |
VLDB |
5.2138566e-05 |
| 7,026 |
Mind the Data Gap: Bridging LLMs to Enterprise Data Integration |
2025 |
CIDR |
4.8570811e-05 |
| 7,048 |
Magneto: Combining Small and Large Language Models for Schema Matching |
2025 |
VLDB |
4.8520651e-05 |
| 8,579 |
RECA: Related Tables Enhanced Column Semantic Type Annotation Framework |
2023 |
VLDB |
4.4922446e-05 |
| 8,852 |
Watchog: A Light-weight Contrastive Learning based Framework for Column Annotation |
2023 |
SIGMOD |
4.4356508e-05 |
| 10,109 |
Retrieve-and-Verify: A Table Context Selection Framework for Accurate Column Annotations |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,142 |
AutoDDG: Automated Dataset Description Generation using Large Language Models |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,498 |
PLM4NDV: Minimizing Data Access for Number of Distinct Values Estimation with Pre-trained Language Models |
2025 |
SIGMOD |
4.1945683e-05 |
| 10,510 |
Table Overlap Estimation through Graph Embeddings |
2025 |
SIGMOD |
4.1945683e-05 |
| 10,512 |
Auto-Test: Learning Semantic-Domain Constraints for Unsupervised Error Detection in Tables |
2025 |
SIGMOD |
4.1945683e-05 |
| 10,658 |
LLMLog: Advanced Log Template Generation via LLM-driven Multi-Round Annotation |
2025 |
VLDB |
4.1945683e-05 |
| 10,753 |
Cents: A Flexible and Cost-Effective Framework for LLM-Based Table Understanding |
2025 |
VLDB |
4.1945683e-05 |
| 11,205 |
Steered Training Data Generation for Learned Semantic Type Detection |
2023 |
SIGMOD |
4.1945683e-05 |
Outgoing Citations (Sorted by Pagerank)
Showing 14 of 14 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 62 |
Freebase: A Collaboratively Created Graph Database For Structuring Human Knowledge |
2008 |
SIGMOD |
0.0006429466 |
| 221 |
Deep Entity Matching with Pre-Trained Language Models |
2021 |
VLDB |
0.00033121824 |
| 364 |
Annotating and Searching Web Tables Using Entities, Types and Relationships |
2010 |
VLDB |
0.00025637562 |
| 382 |
COMA - A system for flexible combination of schema matching approaches |
2002 |
VLDB |
0.00024823252 |
| 513 |
TURL: Table Understanding through Representation Learning |
2021 |
VLDB |
0.00021288342 |
| 1,001 |
Recovering Semantics of Tables on the Web |
2011 |
VLDB |
0.00014706505 |
| 1,277 |
The Data Civilizer System |
2017 |
CIDR |
0.00012879695 |
| 1,482 |
Automating Large-Scale Data Quality Verification |
2018 |
VLDB |
0.00011725533 |
| 1,914 |
Creating Embeddings of Heterogeneous Relational Datasets for Data Integration Tasks |
2020 |
SIGMOD |
0.00010109102 |
| 2,349 |
RPT: Relational Pre-trained Transformer Is Almost All You Need towards Democratizing Data Preparation |
2021 |
VLDB |
8.9876423e-05 |
| 2,730 |
Open Data Integration |
2018 |
VLDB |
8.2126735e-05 |
| 2,888 |
Sato: Contextual Semantic Type Detection in Tables |
2020 |
VLDB |
7.9594996e-05 |
| 3,823 |
Automatic Discovery of Attributes in Relational Databases |
2011 |
SIGMOD |
6.7261168e-05 |
| 5,529 |
Data-Driven Domain Discovery for Structured Datasets |
2020 |
VLDB |
5.4566641e-05 |
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 5,449 |
Transformers for Tabular Data Representation: A Tutorial on Models and Applications |
2022 |
VLDB |
5.5008652e-05 |
| 8,892 |
Generation of Training Examples for Tabular Natural Language Inference |
2023 |
SIGMOD |
4.4275457e-05 |
| 364 |
Annotating and Searching Web Tables Using Entities, Types and Relationships |
2010 |
VLDB |
0.00025637562 |
| 8,913 |
Making Table Understanding Work in Practice |
2022 |
CIDR |
4.427232e-05 |
| 5,099 |
ArcheType: A Novel Framework for Open-Source Column Type Annotation using Large Language Models |
2024 |
VLDB |
5.6997784e-05 |
| 3,335 |
DeepJoin: Joinable Table Discovery with Pre-trained Language Models |
2023 |
VLDB |
7.2065006e-05 |
| 8,852 |
Watchog: A Light-weight Contrastive Learning based Framework for Column Annotation |
2023 |
SIGMOD |
4.4356508e-05 |
| 10,109 |
Retrieve-and-Verify: A Table Context Selection Framework for Accurate Column Annotations |
2026 |
SIGMOD |
4.1945683e-05 |
| 5,509 |
Can Large Language Models Predict Data Correlations from Column Names? |
2023 |
VLDB |
5.4703368e-05 |
| 6,800 |
DTT: An Example-Driven Tabular Transformer for Joinability by Leveraging Large Language Models |
2024 |
SIGMOD |
4.9231471e-05 |