Database Paper Browser

Back to papers

DEXTER: Large-Scale Discovery and Extraction of Product Specifications on the Web

Summary: DEXTER locates product-spec pages with a focused crawler (queries + backlinks) and detects specs via a supervised HTML-fragment classifier. Extraction uses two wrappers: (i) a domain-independent unsupervised wrapper from shared structure, and (ii) a noisy-annotator hybrid; results on 1.46M pages show F≈0.9, precision 0.92, recall 0.95. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
11135
Venue
VLDB
Year
2015
Pagerank
4.616746e-05
Overall Rank
7,919 | 44.91%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 3 of 3 citing papers.

Rank Citing Paper Year Venue Pagerank
9,248 Web Record Extraction with Invariants 2023 VLDB 4.3690661e-05
11,706 Big Data Linkage for Product Specification Pages 2018 SIGMOD 4.1945683e-05
11,775 Building Structured Databases of Factual Knowledge from Massive Text Corpora 2017 SIGMOD 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 14 of 14 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers