Database Paper Browser

Back to papers

Magneto: Combining Small and Large Language Models for Schema Matching

Summary: Magneto uses a two-phase retrieval+reranking pipeline that leverages cheap small LMs to generate candidates and powerful LLMs to rerank, trading minimal compute for high matching accuracy. Novel contributions: LLM-synthesized self-supervised SLM fine-tuning, effective reranking prompts, and a challenging biomedical benchmark. (summarized by gpt-5-mini on Feb 09 2026)

Paper ID
13911
Venue
VLDB
Year
2025
Pagerank
4.8520651e-05
Overall Rank
7,048 | 50.97%
DOI
10.14778/3742728.3742757

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 5 of 5 citing papers.

Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 20 of 20 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
303 Generic Schema Matching with Cupid 2001 VLDB 0.00028301477
382 COMA - A system for flexible combination of schema matching approaches 2002 VLDB 0.00024823252
517 Can Foundation Models Wrangle Your Data? 2023 VLDB 0.00021169035
518 Data Integration for the Relational Web 2009 VLDB 0.00021158934
968 Schema and Ontology Matching with COMA++ 2005 SIGMOD 0.0001495703
1,914 Creating Embeddings of Heterogeneous Relational Datasets for Data Integration Tasks 2020 SIGMOD 0.00010109102
2,517 Annotating Columns with Pre-trained Language Models 2022 SIGMOD 8.6092139e-05
2,587 Table-GPT: Table Fine-tuned GPT for Diverse Table Tasks 2024 SIGMOD 8.4924618e-05
2,730 Open Data Integration 2018 VLDB 8.2126735e-05
2,836 Semantics-aware Dataset Discovery from Data Lakes with Contextualized Column-based Representation Learning 2023 VLDB 8.0443826e-05
3,000 SANTOS: Relationship-based Semantic Table Union Search 2023 SIGMOD 7.7462128e-05
3,015 Chorus: Foundation Models for Unified Data Discovery and Exploration 2024 VLDB 7.7092391e-05
3,335 DeepJoin: Joinable Table Discovery with Pre-trained Language Models 2023 VLDB 7.2065006e-05
3,823 Automatic Discovery of Attributes in Relational Databases 2011 SIGMOD 6.7261168e-05
4,212 Unicorn: A Unified Multi-tasking Model for Supporting Matching Tasks in Data Integration 2023 SIGMOD 6.3555142e-05
5,099 ArcheType: A Novel Framework for Open-Source Column Type Annotation using Large Language Models 2024 VLDB 5.6997784e-05
5,449 Transformers for Tabular Data Representation: A Tutorial on Models and Applications 2022 VLDB 5.5008652e-05
5,947 Top-K Generation of Integrated Schemas Based on Directed and Weighted Correspondences 2009 SIGMOD 5.2614521e-05
7,613 ADnEV: Cross-Domain Schema Matching using Deep Similarity Matrix Adjustment and Evaluation 2020 VLDB 4.6961059e-05
11,025 Sampling Methods for Inner Product Sketching 2024 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Semantically Similar Papers