RoadRunner: Automatic Data Extraction from Data-Intensive Web Sites
Summary: RoadRunner applies a matching technique to auto-generate a common wrapper/schema from pages in the same class. Uses Approximate and Partial Matching to cope with irregular HTML, yielding a single wrapper/schema for varied pages. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
Incoming Citations (Sorted by Pagerank)
Showing 3 of 3 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 2,224 | The SphereSearch Engine for Unified Ranked Retrieval of Heterogeneous XML and Web Documents | 2005 | VLDB | 9.251962e-05 |
| 12,589 | COMPASS: A Concept-based Web Search Engine for HTML, XML, and Deep Web Data | 2004 | VLDB | 4.1945683e-05 |
| 12,590 | An Automatic Data Grabber for Large Web Sites | 2004 | VLDB | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 1 of 1 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 533 | RoadRunner: Towards Automatic Data Extraction from Large Web Sites | 2001 | VLDB | 0.00020757722 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 3,285 | Using the Structure of Web Sites for Automatic Segmentation of Tables | 2004 | SIGMOD | 7.2759001e-05 |
| 2,617 | Extraction and Integration of Partially Overlapping Web Sources | 2013 | VLDB | 8.4462621e-05 |
| 4,137 | Exploiting Content Redundancy for Web Information Extraction | 2010 | VLDB | 6.4181549e-05 |
| 587 | Extracting Structured Data from Web Pages | 2003 | SIGMOD | 0.00019648348 |
| 6,751 | Optimal Schemes for Robust Web Extraction | 2011 | VLDB | 4.939042e-05 |
| 12,525 | Automatic Extraction of Dynamic Record Sections From Search Engine Result Pages | 2006 | VLDB | 4.1945683e-05 |
| 3,678 | Automatic Wrappers for Large Scale Web Extraction | 2011 | VLDB | 6.8517545e-05 |
| 12,258 | ObjectRunner: Lightweight, Targeted Extraction and Querying of Structured Web Data | 2010 | VLDB | 4.1945683e-05 |
| 12,590 | An Automatic Data Grabber for Large Web Sites | 2004 | VLDB | 4.1945683e-05 |
| 533 | RoadRunner: Towards Automatic Data Extraction from Large Web Sites | 2001 | VLDB | 0.00020757722 |