Bootleg: Chasing the Tail with Self-Supervised Named Entity Disambiguation
Summary: Bootleg: open-source, self-supervised NED using a simple transformer + hierarchical regularization to dramatically boost tail-entity disambiguation (up to +41.2 F1) and match/exceed SOTA on benchmarks. Calls out serving of entity embeddings and related data-management challenges. (summarized by gpt-5-mini on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Laurel Orr
- 2. Megan Leszczynski
- 3. Neel Guha
- 4. Sen Wu
- 5. Simran Arora
- 6. Xiao Ling
- 7. Christopher RĂ©
Incoming Citations (Sorted by Pagerank)
Showing 4 of 4 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 3,711 | Saga: A Platform for Continuous Construction and Serving of Knowledge At Scale | 2022 | SIGMOD | 6.823609e-05 |
| 3,942 | Ember: No-Code Context Enrichment via Similarity-Based Keyless Joins | 2022 | VLDB | 6.6114622e-05 |
| 6,228 | Managing ML Pipelines: Feature Stores and the Coming Wave of Embedding Ecosystems | 2021 | VLDB | 5.1470042e-05 |
| 11,317 | Data Management Opportunities for Foundation Models | 2022 | CIDR | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 7 of 7 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 192 | HoloClean: Holistic Data Repairs with Probabilistic Inference | 2017 | VLDB | 0.00035728858 |
| 254 | Snorkel: Rapid Training Data Creation with Weak Supervision | 2018 | VLDB | 0.00030540555 |
| 300 | Deep Learning for Entity Matching: A Design Space Exploration | 2018 | SIGMOD | 0.00028441466 |
| 1,463 | ARDA: Automatic Relational Data Augmentation for Machine Learning | 2020 | VLDB | 0.00011869295 |
| 4,196 | Overton: A Data System for Monitoring and Improving Machine-Learned Products | 2020 | CIDR | 6.3686231e-05 |
| 5,041 | KBPearl: A Knowledge Base Population System Supported by Joint Entity and Relation Linking | 2020 | VLDB | 5.741618e-05 |
| 7,411 | ItemSuggest: A Data Management Platform for Machine Learned Ranking Services | 2019 | CIDR | 4.7364436e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 9,409 | Ground Truth Inference for Weakly Supervised Entity Matching | 2023 | SIGMOD | 4.3441378e-05 |
| 11,256 | Self-Training for Label-Efficient Information Extraction from Semi-Structured Web-Pages | 2023 | VLDB | 4.1945683e-05 |
| 3,640 | Deep Learning for Blocking in Entity Matching: A Design Space Exploration | 2021 | VLDB | 6.8891671e-05 |
| 7,052 | Pre-trained Embeddings for Entity Resolution: An Experimental Analysis | 2023 | VLDB | 4.8497453e-05 |
| 4,974 | Supervised Meta-blocking | 2014 | VLDB | 5.7903293e-05 |
| 5,533 | Dual-Objective Fine-Tuning of BERT for Entity Matching | 2021 | VLDB | 5.4544359e-05 |
| 4,703 | Medical Entity Disambiguation Using Graph Neural Networks | 2021 | SIGMOD | 5.9855056e-05 |
| 2,767 | A Comprehensive Benchmark Framework for Active Learning Methods in Entity Matching | 2020 | SIGMOD | 8.1513883e-05 |
| 221 | Deep Entity Matching with Pre-Trained Language Models | 2021 | VLDB | 0.00033121824 |
| 3,578 | Efficient Approximate Entity Extraction with Edit Distance Constraints | 2009 | SIGMOD | 6.9503858e-05 |