Visual Segmentation for Information Extraction from Heterogeneous Visually Rich Documents
Summary: VS2 segments visually rich documents into logical blocks via document-type-agnostic cues. A distantly supervised search-and-select uses block boundaries to locate entities, outperforming text-only IE across three heterogeneous datasets. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Ritesh Sarkhel
- 2. Arnab Nandi
Incoming Citations (Sorted by Pagerank)
Showing 3 of 3 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 9,252 | Improving Information Extraction from Visually Rich Documents using Visual Span Representations | 2021 | VLDB | 4.3690661e-05 |
| 9,253 | Glean: Structured Extractions from Templatic Documents | 2021 | VLDB | 4.3690661e-05 |
| 11,256 | Self-Training for Label-Efficient Information Extraction from Semi-Structured Web-Pages | 2023 | VLDB | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 4 of 4 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 533 | RoadRunner: Towards Automatic Data Extraction from Large Web Sites | 2001 | VLDB | 0.00020757722 |
| 3,303 | Fonduer: Knowledge Base Construction from Richly Formatted Data | 2018 | SIGMOD | 7.2487486e-05 |
| 3,820 | Enterprise Information Extraction: Recent Developments and Open Challenges | 2010 | SIGMOD | 6.7299199e-05 |
| 6,135 | Extracting Logical Hierarchical Structure of HTML Documents Based on Headings | 2015 | VLDB | 5.1930114e-05 |
Previous
Page 1 / 1
Next