Schema Extraction for Tabular Data on the Web
Summary: CRF-based schema extraction for tabular Web data using a novel logarithmic binning feature encoding to infer table construction from nearby rows. Extends beyond HTML tables to full tables (including spreadsheets), extracting row groups and surpassing WebTables in schema accuracy. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Marco D. Adelfio
- 2. Hanan Samet
Incoming Citations (Sorted by Pagerank)
Showing 9 of 9 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 939 | Data Lake Management: Challenges and Opportunities | 2019 | VLDB | 0.00015187344 |
| 2,269 | Ground: A Data Context Service | 2017 | CIDR | 9.147379e-05 |
| 2,836 | Semantics-aware Dataset Discovery from Data Lakes with Contextualized Column-based Representation Learning | 2023 | VLDB | 8.0443826e-05 |
| 3,000 | SANTOS: Relationship-based Semantic Table Union Search | 2023 | SIGMOD | 7.7462128e-05 |
| 3,288 | Biperpedia: An Ontology for Search Applications | 2014 | VLDB | 7.273034e-05 |
| 3,963 | Pytheas: Pattern-based Table Discovery in CSV Files | 2020 | VLDB | 6.5840643e-05 |
| 6,135 | Extracting Logical Hierarchical Structure of HTML Documents Based on Headings | 2015 | VLDB | 5.1930114e-05 |
| 8,135 | Applying WebTables in Practice | 2015 | CIDR | 4.5777549e-05 |
| 10,589 | Birdie: Natural Language-Driven Table Discovery Using Differentiable Search Index | 2025 | VLDB | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 8 of 8 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 107 | WebTables: Exploring the Power of Tables on the Web | 2008 | VLDB | 0.00048377684 |
| 364 | Annotating and Searching Web Tables Using Entities, Types and Relationships | 2010 | VLDB | 0.00025637562 |
| 420 | InfoGather: Entity Augmentation and Attribute Discovery By Holistic Matching with Web Tables | 2012 | SIGMOD | 0.00023719065 |
| 518 | Data Integration for the Relational Web | 2009 | VLDB | 0.00021158934 |
| 818 | Finding Related Tables | 2012 | SIGMOD | 0.00016311524 |
| 1,001 | Recovering Semantics of Tables on the Web | 2011 | VLDB | 0.00014706505 |
| 1,367 | Answering Table Queries on the Web using Column Keywords | 2012 | VLDB | 0.00012349783 |
| 1,527 | Generic Schema Matching, Ten Years Later | 2011 | VLDB | 0.00011499442 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 902 | Statistical Schema Matching across Web Query Interfaces | 2003 | SIGMOD | 0.00015486247 |
| 12,223 | Schema Clustering and Retrieval for Multi-domain Pay-As-You-Go Data Integration Systems | 2010 | SIGMOD | 4.1945683e-05 |
| 3,742 | TEGRA: Table Extraction by Global Record Alignment | 2015 | SIGMOD | 6.7966898e-05 |
| 1,367 | Answering Table Queries on the Web using Column Keywords | 2012 | VLDB | 0.00012349783 |
| 364 | Annotating and Searching Web Tables Using Entities, Types and Relationships | 2010 | VLDB | 0.00025637562 |
| 1,851 | An Analysis of Structured Data on the Web | 2012 | VLDB | 0.00010327871 |
| 1,001 | Recovering Semantics of Tables on the Web | 2011 | VLDB | 0.00014706505 |
| 1,317 | Harvesting Relational Tables from Lists on the Web | 2009 | VLDB | 0.00012625853 |
| 3,285 | Using the Structure of Web Sites for Automatic Segmentation of Tables | 2004 | SIGMOD | 7.2759001e-05 |
| 107 | WebTables: Exploring the Power of Tables on the Web | 2008 | VLDB | 0.00048377684 |