Measuring the Structural Similarity of Semistructured Documents Using Entropy
Summary: Entropy-based measure of structural similarity for semistructured documents using extracted structure and Ziv-Lempel or Ziv-Merhav crossparsing to compute entropy. Claims the first linear-time approach for this problem, with clustering results rivaling existing methods. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Sven Helmer
Incoming Citations (Sorted by Pagerank)
Showing 1 of 1 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 3,758 | Keyword Search over Relational Databases: A Metadata Approach | 2011 | SIGMOD | 6.7824746e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 11 of 11 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 67 | The Merge/Purge Problem for Large Databases | 1995 | SIGMOD | 0.00061348205 |
| 303 | Generic Schema Matching with Cupid | 2001 | VLDB | 0.00028301477 |
| 533 | RoadRunner: Towards Automatic Data Extraction from Large Web Sites | 2001 | VLDB | 0.00020757722 |
| 587 | Extracting Structured Data from Web Pages | 2003 | SIGMOD | 0.00019648348 |
| 728 | Meaningful Change Detection in Structured Data | 1997 | SIGMOD | 0.00017494982 |
| 992 | XTRACT: A System for Extracting Document Type Descriptors from XML Documents | 2000 | SIGMOD | 0.00014799689 |
| 1,163 | Extracting Schema from Semistructured Data | 1998 | SIGMOD | 0.00013577466 |
| 1,192 | The XXL Search Engine: Ranked Retrieval of XML Data using Indexes and Ontologies | 2002 | SIGMOD | 0.00013432765 |
| 1,390 | Change Detection in Hierarchically Structured Information | 1996 | SIGMOD | 0.00012248349 |
| 2,698 | Visual Web Information Extraction with Lixto* | 2001 | VLDB | 8.2753317e-05 |
| 5,761 | Capturing both Types and Constraints in Data Integration | 2003 | SIGMOD | 5.3377412e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 4,342 | LinkClus: Efficient Clustering via Heterogeneous Semantic Links | 2006 | VLDB | 6.2758722e-05 |
| 6,018 | Relative Lempel-Ziv Factorization for Efficient Storage and Retrieval of Web Collections | 2012 | VLDB | 5.2415551e-05 |
| 4,531 | Efficient Document Analytics on Compressed Data: Method, Challenges, Algorithms, Insights | 2018 | VLDB | 6.1073703e-05 |
| 7,256 | Effective and Efficient Retrieval of Structured Entities | 2020 | VLDB | 4.7869419e-05 |
| 7,522 | Efficient and Tunable Similar Set Retrieval | 2001 | SIGMOD | 4.7180617e-05 |
| 4,951 | Mining Document Collections to Facilitate Accurate Approximate Entity Matching | 2009 | VLDB | 5.8100413e-05 |
| 428 | Latent Semantic Indexing: A Probabilistic Analysis | 1998 | PODS | 0.00023512226 |
| 6,241 | Scaling Similarity Joins over Tree-Structured Data | 2015 | VLDB | 5.1411469e-05 |
| 4,250 | Local Similarity Search for Unstructured Text | 2016 | SIGMOD | 6.3241139e-05 |
| 3,199 | Similarity Evaluation on Tree-structured Data | 2005 | SIGMOD | 7.3927291e-05 |