An Analysis of Structured Data on the Web
Summary: Web-scale analysis of structured data; quantifies value and distribution across top aggregators vs. tail sites for multiple domains. First study of its kind; provides new insights for Web information extraction and data management. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Nilesh Dalvi
- 2. Ashwin Machanavajjhala
- 3. Bo Pang
Incoming Citations (Sorted by Pagerank)
Showing 10 of 10 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 1,211 | Truth Finding on the Deep Web: Is the Problem Solved? | 2013 | VLDB | 0.00013257101 |
| 2,420 | From Data Fusion to Knowledge Fusion | 2014 | VLDB | 8.8530994e-05 |
| 2,617 | Extraction and Integration of Partially Overlapping Web Sources | 2013 | VLDB | 8.4462621e-05 |
| 6,099 | WOO: A Scalable and Multi-tenant Platform for Continuous Knowledge Base Synthesis | 2013 | VLDB | 5.2104516e-05 |
| 6,133 | DIADEM: Thousands of Websites to a Single Database | 2014 | VLDB | 5.1954702e-05 |
| 7,919 | DEXTER: Large-Scale Discovery and Extraction of Product Specifications on the Web | 2015 | VLDB | 4.616746e-05 |
| 10,126 | Visual Template Inference for Data Extraction from Documents | 2026 | SIGMOD | 4.1945683e-05 |
| 11,538 | Quality of Sentiment Analysis Tools: The Reasons of Inconsistency | 2021 | VLDB | 4.1945683e-05 |
| 11,706 | Big Data Linkage for Product Specification Pages | 2018 | SIGMOD | 4.1945683e-05 |
| 11,985 | Online Ordering of Overlapping Data Sources | 2014 | VLDB | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 9 of 9 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 107 | WebTables: Exploring the Power of Tables on the Web | 2008 | VLDB | 0.00048377684 |
| 533 | RoadRunner: Towards Automatic Data Extraction from Large Web Sites | 2001 | VLDB | 0.00020757722 |
| 587 | Extracting Structured Data from Web Pages | 2003 | SIGMOD | 0.00019648348 |
| 1,317 | Harvesting Relational Tables from Lists on the Web | 2009 | VLDB | 0.00012625853 |
| 1,585 | Answering Table Augmentation Queries from Unstructured Lists on the Web | 2009 | VLDB | 0.00011255098 |
| 1,722 | Building Structured Web Community Portals: A Top-Down, Compositional, and Incremental Approach | 2007 | VLDB | 0.00010757784 |
| 3,678 | Automatic Wrappers for Large Scale Web Extraction | 2011 | VLDB | 6.8517545e-05 |
| 4,137 | Exploiting Content Redundancy for Web Information Extraction | 2010 | VLDB | 6.4181549e-05 |
| 4,229 | Harnessing the Deep Web: Present and Future | 2009 | CIDR | 6.3399547e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 4,092 | Structured Annotations of Web Queries | 2010 | SIGMOD | 6.4561959e-05 |
| 12,590 | An Automatic Data Grabber for Large Web Sites | 2004 | VLDB | 4.1945683e-05 |
| 7,326 | Answering Web Queries Using Structured Data Sources | 2009 | SIGMOD | 4.7612871e-05 |
| 2,617 | Extraction and Integration of Partially Overlapping Web Sources | 2013 | VLDB | 8.4462621e-05 |
| 1,395 | Structured Querying of Web Text: A Technical Challenge | 2007 | CIDR | 0.00012207039 |
| 107 | WebTables: Exploring the Power of Tables on the Web | 2008 | VLDB | 0.00048377684 |
| 12,669 | Self-similarity in the web | 2001 | VLDB | 4.1945683e-05 |
| 2,633 | Schema Extraction for Tabular Data on the Web | 2013 | VLDB | 8.4063569e-05 |
| 587 | Extracting Structured Data from Web Pages | 2003 | SIGMOD | 0.00019648348 |
| 3,285 | Using the Structure of Web Sites for Automatic Segmentation of Tables | 2004 | SIGMOD | 7.2759001e-05 |