Glean: Structured Extractions from Templatic Documents
Summary: Glean extracts structured data from templatic documents and generalizes to unseen layouts. Data-management focus: high-quality ground truth, efficient training-data generation, and rapid tooling; yields ~5 F1-point gain on the same model. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Sandeep Tata
- 2. Navneet Potti
- 3. James B. Wendt
- 4. Lauro Beltrão Costa
- 5. Marc Najork
- 6. Beliz Gunel
Incoming Citations (Sorted by Pagerank)
Showing 1 of 1 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 10,126 | Visual Template Inference for Data Extraction from Documents | 2026 | SIGMOD | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 7 of 7 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 254 | Snorkel: Rapid Training Data Creation with Weak Supervision | 2018 | VLDB | 0.00030540555 |
| 1,277 | The Data Civilizer System | 2017 | CIDR | 0.00012879695 |
| 1,317 | Harvesting Relational Tables from Lists on the Web | 2009 | VLDB | 0.00012625853 |
| 3,303 | Fonduer: Knowledge Base Construction from Richly Formatted Data | 2018 | SIGMOD | 7.2487486e-05 |
| 5,684 | Dagger: A Data (not code) Debugger | 2020 | CIDR | 5.3720749e-05 |
| 7,397 | A Probabilistic Approach for Automatically Filling Form-Based Web Interfaces | 2011 | VLDB | 4.7417648e-05 |
| 8,461 | Visual Segmentation for Information Extraction from Heterogeneous Visually Rich Documents | 2019 | SIGMOD | 4.5061205e-05 |
Previous
Page 1 / 1
Next