DEXTER: Large-Scale Discovery and Extraction of Product Specifications on the Web
Summary: DEXTER locates product-spec pages with a focused crawler (queries + backlinks) and detects specs via a supervised HTML-fragment classifier. Extraction uses two wrappers: (i) a domain-independent unsupervised wrapper from shared structure, and (ii) a noisy-annotator hybrid; results on 1.46M pages show F≈0.9, precision 0.92, recall 0.95. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Disheng Qiu
- 2. Luciano Barbosa
- 3. Xin Luna Dong
- 4. Yanyan Shen
- 5. Divesh Srivastava
Incoming Citations (Sorted by Pagerank)
Showing 3 of 3 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 9,248 | Web Record Extraction with Invariants | 2023 | VLDB | 4.3690661e-05 |
| 11,706 | Big Data Linkage for Product Specification Pages | 2018 | SIGMOD | 4.1945683e-05 |
| 11,775 | Building Structured Databases of Factual Knowledge from Massive Text Corpora | 2017 | SIGMOD | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 14 of 14 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 6,412 | CERES: Distantly Supervised Relation Extraction from the Semi-Structured Web | 2018 | VLDB | 5.0740036e-05 |
| 7,826 | The Smallest Extraction Problem | 2021 | VLDB | 4.6416742e-05 |
| 4,092 | Structured Annotations of Web Queries | 2010 | SIGMOD | 6.4561959e-05 |
| 3,678 | Automatic Wrappers for Large Scale Web Extraction | 2011 | VLDB | 6.8517545e-05 |
| 2,617 | Extraction and Integration of Partially Overlapping Web Sources | 2013 | VLDB | 8.4462621e-05 |
| 12,258 | ObjectRunner: Lightweight, Targeted Extraction and Querying of Structured Web Data | 2010 | VLDB | 4.1945683e-05 |
| 11,706 | Big Data Linkage for Product Specification Pages | 2018 | SIGMOD | 4.1945683e-05 |
| 6,133 | DIADEM: Thousands of Websites to a Single Database | 2014 | VLDB | 5.1954702e-05 |
| 11,844 | Potential and Pitfalls of Domain-Specific Information Extraction at Web Scale | 2016 | SIGMOD | 4.1945683e-05 |
| 3,285 | Using the Structure of Web Sites for Automatic Segmentation of Tables | 2004 | SIGMOD | 7.2759001e-05 |