Database Paper Browser

Back to papers

WebTables: Exploring the Power of Tables on the Web

Summary: WebTables builds a web-scale corpus of 154M relational tables from 14.1B HTML pages, each a tiny database. It introduces AcsDB for corpus-wide attribute co-occurrence, enabling better search and tools like auto-complete, synonyms, and join traversal. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
9694
Venue
VLDB
Year
2008
Pagerank
0.00048377684
Overall Rank
107 | 99.26%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 50 of 84 citing papers.

Rank Citing Paper Year Venue Pagerank
364 Annotating and Searching Web Tables Using Entities, Types and Relationships 2010 VLDB 0.00025637562
420 InfoGather: Entity Augmentation and Attribute Discovery By Holistic Matching with Web Tables 2012 SIGMOD 0.00023719065
489 Data Curation at Scale: The Data Tamer System 2013 CIDR 0.00022030728
513 TURL: Table Understanding through Representation Learning 2021 VLDB 0.00021288342
518 Data Integration for the Relational Web 2009 VLDB 0.00021158934
610 Goods: Organizing Google's Datasets 2016 SIGMOD 0.00019232674
818 Finding Related Tables 2012 SIGMOD 0.00016311524
883 Google Fusion Tables: Web-Centered Data Management and Collaboration 2010 SIGMOD 0.00015656548
939 Data Lake Management: Challenges and Opportunities 2019 VLDB 0.00015187344
1,001 Recovering Semantics of Tables on the Web 2011 VLDB 0.00014706505
1,178 Table Union Search on Open Data 2018 VLDB 0.00013468118
1,187 JOSIE: Overlap Set Similarity Search for Finding Joinable Tables in Data Lakes 2019 SIGMOD 0.00013443639
1,221 A Web of Concepts 2009 PODS 0.00013219242
1,267 Foofah: Transforming Data By Example 2017 SIGMOD 0.00012936483
1,317 Harvesting Relational Tables from Lists on the Web 2009 VLDB 0.00012625853
1,367 Answering Table Queries on the Web using Column Keywords 2012 VLDB 0.00012349783
1,644 Finding Related Tables in Data Lakes for Interactive Data Science 2020 SIGMOD 0.00011041787
1,851 An Analysis of Structured Data on the Web 2012 VLDB 0.00010327871
2,141 LSH Ensemble: Internet-Scale Domain Search 2016 VLDB 9.4542625e-05
2,158 Uni-Detect: A Unified Approach to Automated Error Detection in Tables 2019 SIGMOD 9.4141354e-05
2,269 Ground: A Data Context Service 2017 CIDR 9.147379e-05
2,319 Expressive and Flexible Access to Web-Extracted Data: A Keyword-based Structured Query Language 2010 SIGMOD 9.0387108e-05
2,420 From Data Fusion to Knowledge Fusion 2014 VLDB 8.8530994e-05
2,506 Auto-Detect: Data-Driven Error Detection in Tables 2018 SIGMOD 8.6335464e-05
2,617 Extraction and Integration of Partially Overlapping Web Sources 2013 VLDB 8.4462621e-05
2,633 Schema Extraction for Tabular Data on the Web 2013 VLDB 8.4063569e-05
2,730 Open Data Integration 2018 VLDB 8.2126735e-05
2,836 Semantics-aware Dataset Discovery from Data Lakes with Contextualized Column-based Representation Learning 2023 VLDB 8.0443826e-05
2,888 Sato: Contextual Semantic Type Detection in Tables 2020 VLDB 7.9594996e-05
3,015 Chorus: Foundation Models for Unified Data Discovery and Exploration 2024 VLDB 7.7092391e-05
3,155 Ten Years of WebTables 2018 VLDB 7.4672742e-05
3,229 InfoGather+: Semantic Matching and Annotation of Numeric and Time-Varying Attributes in Web Tables 2013 SIGMOD 7.3393682e-05
3,288 Biperpedia: An Ontology for Search Applications 2014 VLDB 7.273034e-05
3,520 GitTables: A Large-Scale Corpus of Relational Tables 2023 SIGMOD 7.0131061e-05
3,678 Automatic Wrappers for Large Scale Web Extraction 2011 VLDB 6.8517545e-05
3,690 Navigating the Data Lake with DATAMARAN: Automatically Extracting Structure from Log Datasets 2018 SIGMOD 6.8384476e-05
3,742 TEGRA: Table Extraction by Global Record Alignment 2015 SIGMOD 6.7966898e-05
3,797 Stitching Web Tables for Improving Matching Quality 2017 VLDB 6.7597149e-05
3,824 Correlation Sketches for Approximate Join-Correlation Queries 2021 SIGMOD 6.7260705e-05
3,931 Extracting and Querying a Comprehensive Web Database 2009 CIDR 6.6193836e-05
3,935 CrowdQ: Crowdsourced Query Understanding 2013 CIDR 6.6163464e-05
3,963 Pytheas: Pattern-based Table Discovery in CSV Files 2020 VLDB 6.5840643e-05
3,985 A First Tutorial on Dataspaces 2008 VLDB 6.5626153e-05
4,092 Structured Annotations of Web Queries 2010 SIGMOD 6.4561959e-05
4,229 Harnessing the Deep Web: Present and Future 2009 CIDR 6.3399547e-05
4,521 A Temporal-Probabilistic Database Model for Information Extraction 2013 VLDB 6.1168322e-05
5,099 ArcheType: A Novel Framework for Open-Source Column Type Annotation using Large Language Models 2024 VLDB 5.6997784e-05
5,563 AnyLog: a Grand Unification of the Internet of Things 2020 CIDR 5.4328568e-05
5,652 From Information to Knowledge: Harvesting Entities and Relationships from Web Sources 2010 PODS 5.3903671e-05
5,928 SchemaPile: A Large Collection of Relational Database Schemas 2024 SIGMOD 5.2685946e-05
Previous Page 1 / 2 Next

Outgoing Citations (Sorted by Pagerank)

Showing 6 of 6 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers

Overall Rank Paper Year Venue Pagerank
1,851 An Analysis of Structured Data on the Web 2012 VLDB 0.00010327871
5,672 Effective Keyword-based Selection of Relational Databases 2007 SIGMOD 5.3784128e-05
7,326 Answering Web Queries Using Structured Data Sources 2009 SIGMOD 4.7612871e-05
818 Finding Related Tables 2012 SIGMOD 0.00016311524
1,317 Harvesting Relational Tables from Lists on the Web 2009 VLDB 0.00012625853
1,001 Recovering Semantics of Tables on the Web 2011 VLDB 0.00014706505
1,367 Answering Table Queries on the Web using Column Keywords 2012 VLDB 0.00012349783
2,633 Schema Extraction for Tabular Data on the Web 2013 VLDB 8.4063569e-05
364 Annotating and Searching Web Tables Using Entities, Types and Relationships 2010 VLDB 0.00025637562
8,135 Applying WebTables in Practice 2015 CIDR 4.5777549e-05