RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Summary: RoadRunner enables automatic data extraction from large web sites by generating wrappers via HTML page similarity/difference analysis. Real-world data-intensive site experiments demonstrate feasibility and scalability of the wrapper generation approach. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
Incoming Citations (Sorted by Pagerank)
Showing 31 of 31 citing papers.
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 3 of 3 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 385 | NoDoSE - A Tool for Semi-Automatically Extracting Structured and Semistructured Data from Text Documents. | 1998 | SIGMOD | 0.00024795739 |
| 1,919 | Cut and Paste | 1997 | PODS | 0.00010094755 |
| 2,204 | To Weave the Web | 1997 | VLDB | 9.2970809e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 2,005 | Record-Boundary Discovery in Web Documents | 1999 | SIGMOD | 9.8112591e-05 |
| 587 | Extracting Structured Data from Web Pages | 2003 | SIGMOD | 0.00019648348 |
| 4,440 | Robust Web Extraction: An Approach Based on a Probabilistic Tree-Edit Model | 2009 | SIGMOD | 6.187819e-05 |
| 8,322 | An XML-based Wrapper Generator for Web Information Extraction | 1999 | SIGMOD | 4.5435639e-05 |
| 12,525 | Automatic Extraction of Dynamic Record Sections From Search Engine Result Pages | 2006 | VLDB | 4.1945683e-05 |
| 6,751 | Optimal Schemes for Robust Web Extraction | 2011 | VLDB | 4.939042e-05 |
| 3,678 | Automatic Wrappers for Large Scale Web Extraction | 2011 | VLDB | 6.8517545e-05 |
| 12,258 | ObjectRunner: Lightweight, Targeted Extraction and Querying of Structured Web Data | 2010 | VLDB | 4.1945683e-05 |
| 12,590 | An Automatic Data Grabber for Large Web Sites | 2004 | VLDB | 4.1945683e-05 |
| 6,403 | RoadRunner: Automatic Data Extraction from Data-Intensive Web Sites | 2002 | SIGMOD | 5.0797045e-05 |