WADaR: Joint Wrapper and Data Repair
Summary: WADaR is a scalable tool for joint wrapper and data repair in web-scraped relations. It uses off-the-shelf entity recognizers to locate targets and Markov-chain repairs to fix data and wrappers; yields 15–60% quality gains and full wrapper repair in >50% without site knowledge. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Stefano Ortona
- 2. Giorgio Orsi
- 3. Marcello Buoncristiano
- 4. Tim Furche
Incoming Citations (Sorted by Pagerank)
Showing 4 of 4 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 3,252 | Auto-Suggest: Learning-to-Recommend Data Preparation Steps Using Data Science Notebooks | 2020 | SIGMOD | 7.3178277e-05 |
| 6,412 | CERES: Distantly Supervised Relation Extraction from the Semi-Structured Web | 2018 | VLDB | 5.0740036e-05 |
| 7,826 | The Smallest Extraction Problem | 2021 | VLDB | 4.6416742e-05 |
| 9,248 | Web Record Extraction with Invariants | 2023 | VLDB | 4.3690661e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 7 of 7 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 533 | RoadRunner: Towards Automatic Data Extraction from Large Web Sites | 2001 | VLDB | 0.00020757722 |
| 2,617 | Extraction and Integration of Partially Overlapping Web Sources | 2013 | VLDB | 8.4462621e-05 |
| 3,742 | TEGRA: Table Extraction by Global Record Alignment | 2015 | SIGMOD | 6.7966898e-05 |
| 3,747 | Context-Aware Wrapping: Synchronized Data Extraction | 2007 | VLDB | 6.7917216e-05 |
| 4,387 | Hybrid In-Database Inference for Declarative Information Extraction | 2011 | SIGMOD | 6.2320072e-05 |
| 6,133 | DIADEM: Thousands of Websites to a Single Database | 2014 | VLDB | 5.1954702e-05 |
| 12,085 | Aggregating Semantic Annotators | 2013 | VLDB | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 533 | RoadRunner: Towards Automatic Data Extraction from Large Web Sites | 2001 | VLDB | 0.00020757722 |
| 6,751 | Optimal Schemes for Robust Web Extraction | 2011 | VLDB | 4.939042e-05 |
| 9,026 | Robust and Noise Resistant Wrapper Induction | 2016 | SIGMOD | 4.4051668e-05 |
| 4,440 | Robust Web Extraction: An Approach Based on a Probabilistic Tree-Edit Model | 2009 | SIGMOD | 6.187819e-05 |
| 6,403 | RoadRunner: Automatic Data Extraction from Data-Intensive Web Sites | 2002 | SIGMOD | 5.0797045e-05 |
| 3,678 | Automatic Wrappers for Large Scale Web Extraction | 2011 | VLDB | 6.8517545e-05 |
| 2,617 | Extraction and Integration of Partially Overlapping Web Sources | 2013 | VLDB | 8.4462621e-05 |
| 8,406 | DADER: Hands-Off Entity Resolution with Domain Adaptation | 2022 | VLDB | 4.5220083e-05 |
| 6,133 | DIADEM: Thousands of Websites to a Single Database | 2014 | VLDB | 5.1954702e-05 |
| 8,322 | An XML-based Wrapper Generator for Web Information Extraction | 1999 | SIGMOD | 4.5435639e-05 |