Approximate Matching of Hierarchical Data Using pq-Grams
Summary: Approximate matching of hierarchical data via pq-grams for autonomous sources. The pq-gram distance provides an efficient, scalable approximation of tree edit distance for ordered labeled trees, enabling near-matches in hierarchical records (e.g., addresses) and is validated with synthetic and real data. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Nikolaus Augsten
- 2. Michael Böhlen
- 3. Johann Gamper
Incoming Citations (Sorted by Pagerank)
Showing 7 of 7 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 951 | Comparing Stars: On Approximating Graph Edit Distance | 2009 | VLDB | 0.00015106325 |
| 5,615 | A Scalable Index for Top-k Subtree Similarity Queries | 2019 | SIGMOD | 5.4101086e-05 |
| 6,732 | An Incrementally Maintainable Index for Approximate Lookups in Hierarchical Data | 2006 | VLDB | 4.9477058e-05 |
| 7,215 | SyncSignature: A Simple, Efficient, Parallelizable Framework for Tree Similarity Joins | 2023 | VLDB | 4.7985991e-05 |
| 10,706 | Extensible and Robust Evaluation of Similarity Queries | 2025 | VLDB | 4.1945683e-05 |
| 12,089 | Synthetising Changes in XML Documents as PULs | 2013 | VLDB | 4.1945683e-05 |
| 12,357 | The Power of Two Min-Hashes for Similarity Search among Hierarchical Data Objects | 2008 | PODS | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 8 of 8 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 125 | Approximate String Joins in a Database (Almost) for Free | 2001 | VLDB | 0.00044847972 |
| 193 | On Supporting Containment Queries in Relational Database Management Systems | 2001 | SIGMOD | 0.00035610321 |
| 240 | Holistic Twig Joins: Optimal XML Pattern Matching | 2002 | SIGMOD | 0.00031603463 |
| 1,149 | A Comprehensive XQuery to SQL Translation using Dynamic Interval Encoding | 2003 | SIGMOD | 0.0001365931 |
| 1,390 | Change Detection in Hierarchically Structured Information | 1996 | SIGMOD | 0.00012248349 |
| 2,784 | Approximate XML Joins | 2002 | SIGMOD | 8.128931e-05 |
| 3,120 | Holistic Twig Joins on Indexed XML Documents | 2003 | VLDB | 7.5295938e-05 |
| 3,419 | Approximate XML Query Answers | 2004 | SIGMOD | 7.1173416e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 2,409 | TreeSpan: Efficiently Computing Similarity All-Matching | 2012 | SIGMOD | 8.8776858e-05 |
| 3,226 | Extending Q-Grams to Estimate Selectivity of String Matching with Low Edit Distance | 2007 | VLDB | 7.3433307e-05 |
| 12,357 | The Power of Two Min-Hashes for Similarity Search among Hierarchical Data Objects | 2008 | PODS | 4.1945683e-05 |
| 125 | Approximate String Joins in a Database (Almost) for Free | 2001 | VLDB | 0.00044847972 |
| 11,559 | Approximate Pattern Matching in Massive Graphs with Precision and Recall Guarantees | 2020 | SIGMOD | 4.1945683e-05 |
| 7,708 | Efficient Top-k Algorithms for Approximate Substring Matching | 2013 | SIGMOD | 4.6721808e-05 |
| 2,784 | Approximate XML Joins | 2002 | SIGMOD | 8.128931e-05 |
| 6,241 | Scaling Similarity Joins over Tree-Structured Data | 2015 | VLDB | 5.1411469e-05 |
| 3,199 | Similarity Evaluation on Tree-structured Data | 2005 | SIGMOD | 7.3927291e-05 |
| 6,732 | An Incrementally Maintainable Index for Approximate Lookups in Hierarchical Data | 2006 | VLDB | 4.9477058e-05 |