Magellan: Toward Building Entity Matching Management Systems over Data Science Stacks

Summary: Magellan proposes an EM management system that extends beyond algorithms to end-to-end tooling on Python data-science stacks. It offers a step-by-step guide, full EM pipeline tooling, and an interactive scripting env for rapid experiments, evaluated with 44 users. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID: 11289
Venue: VLDB
Year: 2016
Pagerank: 6.1566477e-05
Overall Rank: 4,462 | 68.99%
DOI: -

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 10 of 10 citing papers.

Rank	Citing Paper	Year	Venue	Pagerank
516	Can Foundation Models Wrangle Your Data?	2023	VLDB	0.00021194444
705	Magellan: Toward Building Entity Matching Management Systems	2016	VLDB	0.00017779048
1,643	Finding Related Tables in Data Lakes for Interactive Data Science	2020	SIGMOD	0.00011031534
8,096	Saga: A Scalable Framework for Optimizing Data Cleaning Pipelines for Machine Learning Applications	2023	SIGMOD	4.583522e-05
8,732	Unveiling Challenges for LLMs in Enterprise Data Engineering	2026	VLDB	4.4520434e-05
8,913	PromptEM: Prompt-tuning for Low-resource Generalized Entity Matching	2023	VLDB	4.4229886e-05
9,204	Compact, Tamper-Resistant Archival of Fine-Grained Provenance	2021	VLDB	4.3701044e-05
9,846	HyperBlocker: Accelerating Rule-based Blocking in Entity Resolution using GPUs	2025	VLDB	4.2680295e-05
11,050	Blocker and Matcher Can Mutually Benefit: A Co-Learning Framework for Low-Resource Entity Resolution	2024	VLDB	4.1905499e-05
11,232	VersaMatch: Ontology Matching with Weak Supervision	2023	VLDB	4.1905499e-05

Outgoing Citations (Sorted by Pagerank)

Showing 3 of 3 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank	Cited Paper	Year	Venue	Pagerank
94	CrowdDB: Answering Queries with Crowdsourcing	2011	SIGMOD	0.00051273089
705	Magellan: Toward Building Entity Matching Management Systems	2016	VLDB	0.00017779048
1,012	NADEEF: A Commodity Data Cleaning System	2013	SIGMOD	0.00014638349

Semantically Similar Papers

Overall Rank	Paper	Year	Venue	Pagerank
11,532	Valentine in Action: Matching Tabular Data at Scale	2021	VLDB	4.1905499e-05
13,251	DataMingler: A Novel Approach to Data Virtualization	2021	SIGMOD	-
9,462	The Battleship Approach to the Low Resource Entity Matching Problem	2023	SIGMOD	4.3324933e-05
5,872	Demonstration of Panda: A Weakly Supervised Entity Matching System	2021	VLDB	5.2908178e-05
9,023	Entity Matching in the Wild: A Consistent and Versatile Framework to Unify Data in Industrial Applications	2020	SIGMOD	4.4037187e-05
8,825	Analyzing and Revising Data Integration Schemas to Improve Their Matchability	2008	VLDB	4.4371579e-05
11,747	CloudMatcher: A Hands-Off Cloud/Crowd Service for Entity Matching	2018	VLDB	4.1905499e-05
293	Deep Learning for Entity Matching: A Design Space Exploration	2018	SIGMOD	0.00028661817
6,750	Entity Matching Meets Data Science: A Progress Report from the Magellan Project	2019	SIGMOD	4.936137e-05
705	Magellan: Toward Building Entity Matching Management Systems	2016	VLDB	0.00017779048