Big Data Linkage for Product Specification Pages
Summary: RaF (Redundancy as Friend) performs big data linkage for product pages by discovering and resolving identifiers before schema alignment. By exploiting global identifier redundancy and local page homogeneity, it links millions of pages across head and tail sources; evaluated on the Dexter dataset (1.9M pages, 7.1k sources). (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
No non-self incoming citations found for this paper in this database.
Authors
- 1. Disheng Qiu
- 2. Luciano Barbosa
- 3. Valter Crescenzi
- 4. Paolo Merialdo
- 5. Divesh Srivastava
Incoming Citations (Sorted by Pagerank)
Showing 0 of 0 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 8 of 8 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 398 | Big Data Integration | 2013 | VLDB | 0.00024372588 |
| 587 | Extracting Structured Data from Web Pages | 2003 | SIGMOD | 0.00019648348 |
| 1,851 | An Analysis of Structured Data on the Web | 2012 | VLDB | 0.00010327871 |
| 2,617 | Extraction and Integration of Partially Overlapping Web Sources | 2013 | VLDB | 8.4462621e-05 |
| 4,137 | Exploiting Content Redundancy for Web Information Extraction | 2010 | VLDB | 6.4181549e-05 |
| 4,440 | Robust Web Extraction: An Approach Based on a Probabilistic Tree-Edit Model | 2009 | SIGMOD | 6.187819e-05 |
| 7,006 | Synthesizing Products for Online Catalogs | 2011 | VLDB | 4.8653916e-05 |
| 7,919 | DEXTER: Large-Scale Discovery and Extraction of Product Specifications on the Web | 2015 | VLDB | 4.616746e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 12,223 | Schema Clustering and Retrieval for Multi-domain Pay-As-You-Go Data Integration Systems | 2010 | SIGMOD | 4.1945683e-05 |
| 672 | An Interactive Clustering-based Approach to Integrating Source Query Interfaces on the Deep Web | 2004 | SIGMOD | 0.00018355746 |
| 229 | Reference Reconciliation in Complex Information Spaces | 2005 | SIGMOD | 0.00032242633 |
| 4,383 | Incremental Record Linkage | 2014 | VLDB | 6.2383094e-05 |
| 2,420 | From Data Fusion to Knowledge Fusion | 2014 | VLDB | 8.8530994e-05 |
| 902 | Statistical Schema Matching across Web Query Interfaces | 2003 | SIGMOD | 0.00015486247 |
| 13,602 | Information Discovery in Loosely Integrated Data | 2007 | SIGMOD | - |
| 6,810 | Record Linkage with Uniqueness Constraints and Erroneous Values | 2010 | VLDB | 4.9203397e-05 |
| 7,006 | Synthesizing Products for Online Catalogs | 2011 | VLDB | 4.8653916e-05 |
| 3,992 | Discovering Linkage Points over Web Data | 2013 | VLDB | 6.5544834e-05 |