Bootleg: Chasing the Tail with Self-Supervised Named Entity Disambiguation

Summary: Bootleg: open-source, self-supervised NED using a simple transformer + hierarchical regularization to dramatically boost tail-entity disambiguation (up to +41.2 F1) and match/exceed SOTA on benchmarks. Calls out serving of entity embeddings and related data-management challenges. (summarized by gpt-5-mini on Feb 09 2026)

Paper ID: 412
Venue: CIDR
Year: 2021
Pagerank: 4.3383453e-05
Overall Rank: 9,443 | 34.38%
DOI: -

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 4 of 4 citing papers.

Rank	Citing Paper	Year	Venue	Pagerank
3,716	Saga: A Platform for Continuous Construction and Serving of Knowledge At Scale	2022	SIGMOD	6.8170433e-05
3,765	Ember: No-Code Context Enrichment via Similarity-Based Keyless Joins	2022	VLDB	6.7760748e-05
6,225	Managing ML Pipelines: Feature Stores and the Coming Wave of Embedding Ecosystems	2021	VLDB	5.1425119e-05
11,319	Data Management Opportunities for Foundation Models	2022	CIDR	4.1905499e-05

Outgoing Citations (Sorted by Pagerank)

Showing 7 of 7 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank	Cited Paper	Year	Venue	Pagerank
192	HoloClean: Holistic Data Repairs with Probabilistic Inference	2017	VLDB	0.00035692958
252	Snorkel: Rapid Training Data Creation with Weak Supervision	2018	VLDB	0.00030532082
293	Deep Learning for Entity Matching: A Design Space Exploration	2018	SIGMOD	0.00028661817
1,462	ARDA: Automatic Relational Data Augmentation for Machine Learning	2020	VLDB	0.00011866333
4,197	Overton: A Data System for Monitoring and Improving Machine-Learned Products	2020	CIDR	6.3625568e-05
5,040	KBPearl: A Knowledge Base Population System Supported by Joint Entity and Relation Linking	2020	VLDB	5.7361009e-05
7,411	ItemSuggest: A Data Management Platform for Machine Learned Ranking Services	2019	CIDR	4.7319005e-05

Semantically Similar Papers

Overall Rank	Paper	Year	Venue	Pagerank
9,415	Ground Truth Inference for Weakly Supervised Entity Matching	2023	SIGMOD	4.3399748e-05
11,258	Self-Training for Label-Efficient Information Extraction from Semi-Structured Web-Pages	2023	VLDB	4.1905499e-05
3,469	Deep Learning for Blocking in Entity Matching: A Design Space Exploration	2021	VLDB	7.0629476e-05
6,558	Pre-trained Embeddings for Entity Resolution: An Experimental Analysis	2023	VLDB	5.0060112e-05
4,967	Supervised Meta-blocking	2014	VLDB	5.7939544e-05
5,214	Dual-Objective Fine-Tuning of BERT for Entity Matching	2021	VLDB	5.6236713e-05
4,701	Medical Entity Disambiguation Using Graph Neural Networks	2021	SIGMOD	5.9797526e-05
2,758	A Comprehensive Benchmark Framework for Active Learning Methods in Entity Matching	2020	SIGMOD	8.1668285e-05
219	Deep Entity Matching with Pre-Trained Language Models	2021	VLDB	0.00033354456
3,583	Efficient Approximate Entity Extraction with Edit Distance Constraints	2009	SIGMOD	6.944299e-05