Back to papers
Magneto: Combining Small and Large Language Models for Schema Matching
Summary: Magneto uses a two-phase retrieval+reranking pipeline that leverages cheap small LMs to generate candidates and powerful LLMs to rerank, trading minimal compute for high matching accuracy. Novel contributions: LLM-synthesized self-supervised SLM fine-tuning, effective reranking prompts, and a challenging biomedical benchmark.
(summarized by gpt-5-mini on Feb 09 2026)
- Paper ID
- 13911
- Venue
- VLDB
- Year
- 2025
- Pagerank
- 4.8520651e-05
- Overall Rank
- 7,048 | 50.97%
- DOI
-
10.14778/3742728.3742757
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 5 of 5 citing papers.
Outgoing Citations (Sorted by Pagerank)
Showing 20 of 20 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 303 |
Generic Schema Matching with Cupid |
2001 |
VLDB |
0.00028301477 |
| 382 |
COMA - A system for flexible combination of schema matching approaches |
2002 |
VLDB |
0.00024823252 |
| 517 |
Can Foundation Models Wrangle Your Data? |
2023 |
VLDB |
0.00021169035 |
| 518 |
Data Integration for the Relational Web |
2009 |
VLDB |
0.00021158934 |
| 968 |
Schema and Ontology Matching with COMA++ |
2005 |
SIGMOD |
0.0001495703 |
| 1,914 |
Creating Embeddings of Heterogeneous Relational Datasets for Data Integration Tasks |
2020 |
SIGMOD |
0.00010109102 |
| 2,517 |
Annotating Columns with Pre-trained Language Models |
2022 |
SIGMOD |
8.6092139e-05 |
| 2,587 |
Table-GPT: Table Fine-tuned GPT for Diverse Table Tasks |
2024 |
SIGMOD |
8.4924618e-05 |
| 2,730 |
Open Data Integration |
2018 |
VLDB |
8.2126735e-05 |
| 2,836 |
Semantics-aware Dataset Discovery from Data Lakes with Contextualized Column-based Representation Learning |
2023 |
VLDB |
8.0443826e-05 |
| 3,000 |
SANTOS: Relationship-based Semantic Table Union Search |
2023 |
SIGMOD |
7.7462128e-05 |
| 3,015 |
Chorus: Foundation Models for Unified Data Discovery and Exploration |
2024 |
VLDB |
7.7092391e-05 |
| 3,335 |
DeepJoin: Joinable Table Discovery with Pre-trained Language Models |
2023 |
VLDB |
7.2065006e-05 |
| 3,823 |
Automatic Discovery of Attributes in Relational Databases |
2011 |
SIGMOD |
6.7261168e-05 |
| 4,212 |
Unicorn: A Unified Multi-tasking Model for Supporting Matching Tasks in Data Integration |
2023 |
SIGMOD |
6.3555142e-05 |
| 5,099 |
ArcheType: A Novel Framework for Open-Source Column Type Annotation using Large Language Models |
2024 |
VLDB |
5.6997784e-05 |
| 5,449 |
Transformers for Tabular Data Representation: A Tutorial on Models and Applications |
2022 |
VLDB |
5.5008652e-05 |
| 5,947 |
Top-K Generation of Integrated Schemas Based on Directed and Weighted Correspondences |
2009 |
SIGMOD |
5.2614521e-05 |
| 7,613 |
ADnEV: Cross-Domain Schema Matching using Deep Similarity Matrix Adjustment and Evaluation |
2020 |
VLDB |
4.6961059e-05 |
| 11,025 |
Sampling Methods for Inner Product Sketching |
2024 |
VLDB |
4.1945683e-05 |
Semantically Similar Papers