Using the Structure of Web Sites for Automatic Segmentation of Tables
Summary: Automatic extraction and segmentation of records from web tables without user input; leverages common table/list layouts and detail-page links. Two algorithms: constraint-based CSP using detail-page constraints, and probabilistic inference, domain-independent and tested on twelve sites. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Kristina Lerman
- 2. Lise Getoor
- 3. Steven Minton
- 4. Craig Knoblock
Incoming Citations (Sorted by Pagerank)
Showing 5 of 5 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 1,317 | Harvesting Relational Tables from Lists on the Web | 2009 | VLDB | 0.00012625853 |
| 3,747 | Context-Aware Wrapping: Synchronized Data Extraction | 2007 | VLDB | 6.7917216e-05 |
| 4,707 | Object-level Vertical Search | 2007 | CIDR | 5.9810753e-05 |
| 7,424 | Table Extraction and Understanding for Scientific and Enterprise Applications | 2020 | VLDB | 4.7339251e-05 |
| 8,088 | PIDS: Attribute Decomposition for Improved Compression and Query Performance in Columnar Storage | 2020 | VLDB | 4.5897316e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 5 of 5 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 234 | Crawling the Hidden Web | 2001 | VLDB | 0.00032018108 |
| 533 | RoadRunner: Towards Automatic Data Extraction from Large Web Sites | 2001 | VLDB | 0.00020757722 |
| 587 | Extracting Structured Data from Web Pages | 2003 | SIGMOD | 0.00019648348 |
| 637 | Automatic segmentation of text into structured records | 2001 | SIGMOD | 0.00018824614 |
| 2,005 | Record-Boundary Discovery in Web Documents | 1999 | SIGMOD | 9.8112591e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 2,005 | Record-Boundary Discovery in Web Documents | 1999 | SIGMOD | 9.8112591e-05 |
| 12,525 | Automatic Extraction of Dynamic Record Sections From Search Engine Result Pages | 2006 | VLDB | 4.1945683e-05 |
| 12,590 | An Automatic Data Grabber for Large Web Sites | 2004 | VLDB | 4.1945683e-05 |
| 1,851 | An Analysis of Structured Data on the Web | 2012 | VLDB | 0.00010327871 |
| 1,367 | Answering Table Queries on the Web using Column Keywords | 2012 | VLDB | 0.00012349783 |
| 1,585 | Answering Table Augmentation Queries from Unstructured Lists on the Web | 2009 | VLDB | 0.00011255098 |
| 4,137 | Exploiting Content Redundancy for Web Information Extraction | 2010 | VLDB | 6.4181549e-05 |
| 2,633 | Schema Extraction for Tabular Data on the Web | 2013 | VLDB | 8.4063569e-05 |
| 1,317 | Harvesting Relational Tables from Lists on the Web | 2009 | VLDB | 0.00012625853 |
| 587 | Extracting Structured Data from Web Pages | 2003 | SIGMOD | 0.00019648348 |