Database Paper Browser

Back to papers

Annotating Columns with Pre-trained Language Models

Summary: Doduo, a multi-task framework using pre-trained language models, annotates table columns and their relations from the table. Achieves SOTA on two benchmarks for column types and relations, with up to 4.0% and 11.9% gains, using 8 tokens per column. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
6358
Venue
SIGMOD
Year
2022
Pagerank
8.6092139e-05
Overall Rank
2,517 | 82.50%
DOI
10.1145/3514221.3517906

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 23 of 23 citing papers.

Rank Citing Paper Year Venue Pagerank
2,587 Table-GPT: Table Fine-tuned GPT for Diverse Table Tasks 2024 SIGMOD 8.4924618e-05
2,836 Semantics-aware Dataset Discovery from Data Lakes with Contextualized Column-based Representation Learning 2023 VLDB 8.0443826e-05
3,000 SANTOS: Relationship-based Semantic Table Union Search 2023 SIGMOD 7.7462128e-05
3,015 Chorus: Foundation Models for Unified Data Discovery and Exploration 2024 VLDB 7.7092391e-05
3,335 DeepJoin: Joinable Table Discovery with Pre-trained Language Models 2023 VLDB 7.2065006e-05
3,876 The Design of an LLM-powered Unstructured Analytics System 2025 CIDR 6.6741456e-05
4,859 Integrating Data Lake Tables 2023 VLDB 5.8732433e-05
5,023 GenRewrite: Query Rewriting via Large Language Models 2026 SIGMOD 5.75363e-05
5,099 ArcheType: A Novel Framework for Open-Source Column Type Annotation using Large Language Models 2024 VLDB 5.6997784e-05
5,449 Transformers for Tabular Data Representation: A Tutorial on Models and Applications 2022 VLDB 5.5008652e-05
6,092 Observatory: Characterizing Embeddings of Relational Tables 2024 VLDB 5.2138566e-05
7,026 Mind the Data Gap: Bridging LLMs to Enterprise Data Integration 2025 CIDR 4.8570811e-05
7,048 Magneto: Combining Small and Large Language Models for Schema Matching 2025 VLDB 4.8520651e-05
8,579 RECA: Related Tables Enhanced Column Semantic Type Annotation Framework 2023 VLDB 4.4922446e-05
8,852 Watchog: A Light-weight Contrastive Learning based Framework for Column Annotation 2023 SIGMOD 4.4356508e-05
10,109 Retrieve-and-Verify: A Table Context Selection Framework for Accurate Column Annotations 2026 SIGMOD 4.1945683e-05
10,142 AutoDDG: Automated Dataset Description Generation using Large Language Models 2026 SIGMOD 4.1945683e-05
10,498 PLM4NDV: Minimizing Data Access for Number of Distinct Values Estimation with Pre-trained Language Models 2025 SIGMOD 4.1945683e-05
10,510 Table Overlap Estimation through Graph Embeddings 2025 SIGMOD 4.1945683e-05
10,512 Auto-Test: Learning Semantic-Domain Constraints for Unsupervised Error Detection in Tables 2025 SIGMOD 4.1945683e-05
10,658 LLMLog: Advanced Log Template Generation via LLM-driven Multi-Round Annotation 2025 VLDB 4.1945683e-05
10,753 Cents: A Flexible and Cost-Effective Framework for LLM-Based Table Understanding 2025 VLDB 4.1945683e-05
11,205 Steered Training Data Generation for Learned Semantic Type Detection 2023 SIGMOD 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 14 of 14 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers