DIADEM: Thousands of Websites to a Single Database
Summary: Automatic full-site extraction at scale using a self-adaptive network of relational transducers. Exhaustive wrappers for thousands of sites across domains with 97% precision on >90% of sites, via combining phenomenological and ontological knowledge. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Tim Furche
- 2. Georg Gottlob
- 3. Giovanni Grasso
- 4. Xiaonan Guo
- 5. Giorgio Orsi
- 6. Christian Schallhart
- 7. Cheng Wang
Incoming Citations (Sorted by Pagerank)
Showing 7 of 7 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 5,705 | Datalog Unchained | 2021 | PODS | 5.3621239e-05 |
| 6,195 | WADaR: Joint Wrapper and Data Repair | 2015 | VLDB | 5.1618114e-05 |
| 6,412 | CERES: Distantly Supervised Relation Extraction from the Semi-Structured Web | 2018 | VLDB | 5.0740036e-05 |
| 7,826 | The Smallest Extraction Problem | 2021 | VLDB | 4.6416742e-05 |
| 7,919 | DEXTER: Large-Scale Discovery and Extraction of Product Specifications on the Web | 2015 | VLDB | 4.616746e-05 |
| 9,026 | Robust and Noise Resistant Wrapper Induction | 2016 | SIGMOD | 4.4051668e-05 |
| 11,543 | Migrating a Privacy-Safe Information Extraction System to a Software 2.0 Design | 2020 | CIDR | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 13 of 13 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 5,774 | A Hierarchical Approach to Model Web Query Interfaces for Web Source Integration | 2009 | VLDB | 5.3313642e-05 |
| 12,258 | ObjectRunner: Lightweight, Targeted Extraction and Querying of Structured Web Data | 2010 | VLDB | 4.1945683e-05 |
| 587 | Extracting Structured Data from Web Pages | 2003 | SIGMOD | 0.00019648348 |
| 3,285 | Using the Structure of Web Sites for Automatic Segmentation of Tables | 2004 | SIGMOD | 7.2759001e-05 |
| 11,844 | Potential and Pitfalls of Domain-Specific Information Extraction at Web Scale | 2016 | SIGMOD | 4.1945683e-05 |
| 3,931 | Extracting and Querying a Comprehensive Web Database | 2009 | CIDR | 6.6193836e-05 |
| 4,106 | Extracting Databases from Dark Data with DeepDive | 2016 | SIGMOD | 6.4456184e-05 |
| 2,617 | Extraction and Integration of Partially Overlapping Web Sources | 2013 | VLDB | 8.4462621e-05 |
| 6,412 | CERES: Distantly Supervised Relation Extraction from the Semi-Structured Web | 2018 | VLDB | 5.0740036e-05 |
| 12,590 | An Automatic Data Grabber for Large Web Sites | 2004 | VLDB | 4.1945683e-05 |