Power-Law Based Estimation of Set Similarity Join Size
Summary: Power-law guided estimation of SSJoin size via compact Min-Hash signatures; exploits frequent signature patterns to count support. A novel lattice-based IE counting method yields linear complexity in lattice size, enabling light-weight mining with high accuracy and efficiency. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Hongrae Lee
- 2. Raymond T. Ng
- 3. Kyuseok Shim
Incoming Citations (Sorted by Pagerank)
Showing 7 of 7 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 1,396 | Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search | 2012 | SIGMOD | 0.00012204748 |
| 2,592 | Pass-Join: A Partition-based Method for Similarity Joins | 2012 | VLDB | 8.4795761e-05 |
| 2,740 | String Similarity Joins: An Experimental Evaluation | 2014 | VLDB | 8.1980628e-05 |
| 5,073 | Faerie: Efficient Filtering Algorithms for Approximate Dictionary-based Entity Extraction | 2011 | SIGMOD | 5.7177424e-05 |
| 5,151 | String Similarity Measures and Joins with Synonyms | 2013 | SIGMOD | 5.6609851e-05 |
| 5,220 | Similarity Join Size Estimation using Locality Sensitive Hashing | 2011 | VLDB | 5.6216111e-05 |
| 5,469 | Learned Cardinality Estimation for Similarity Queries | 2021 | SIGMOD | 5.4898192e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 11 of 11 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 36 | Fast Algorithms for Mining Association Rules | 1994 | VLDB | 0.00076161096 |
| 67 | The Merge/Purge Problem for Large Databases | 1995 | SIGMOD | 0.00061348205 |
| 250 | Efficient set joins on similarity predicates | 2004 | SIGMOD | 0.00030661988 |
| 266 | Efficient Exact Set-Similarity Joins | 2006 | VLDB | 0.00029718727 |
| 475 | Mining Database Structure; Or, How to Build a Data Quality Browser | 2002 | SIGMOD | 0.00022303253 |
| 678 | ConQuer: Efficient Management of Inconsistent Databases | 2005 | SIGMOD | 0.00018253213 |
| 759 | To Search or to Crawl? Towards a Query Optimizer for Text-Centric Tasks | 2006 | SIGMOD | 0.00017064615 |
| 2,171 | Selectivity Estimation For Boolean Queries | 2000 | PODS | 9.3807165e-05 |
| 2,779 | Hashed Samples: Selectivity Estimators For Set Similarity Selection Queries | 2008 | VLDB | 8.1320575e-05 |
| 3,226 | Extending Q-Grams to Estimate Selectivity of String Matching with Low Edit Distance | 2007 | VLDB | 7.3433307e-05 |
| 6,161 | Spatial Join Selectivity Using Power Laws | 2000 | SIGMOD | 5.1753664e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 4,775 | Set Similarity Joins on MapReduce: An Experimental Survey | 2018 | VLDB | 5.9315784e-05 |
| 11,247 | A Two-Level Signature Scheme for Stable Set Similarity Joins | 2023 | VLDB | 4.1945683e-05 |
| 3,459 | An Empirical Evaluation of Set Similarity Join Techniques | 2016 | VLDB | 7.072508e-05 |
| 250 | Efficient set joins on similarity predicates | 2004 | SIGMOD | 0.00030661988 |
| 266 | Efficient Exact Set-Similarity Joins | 2006 | VLDB | 0.00029718727 |
| 3,490 | Leveraging Set Relations in Exact Set Similarity Join | 2017 | VLDB | 7.0465856e-05 |
| 4,050 | An Efficient Partition Based Method for Exact Set Similarity Joins | 2016 | VLDB | 6.4953612e-05 |
| 3,708 | Is Min-Wise Hashing Optimal for Summarizing Set Intersection? | 2014 | PODS | 6.8247903e-05 |
| 4,353 | Overlap Set Similarity Joins with Theoretical Guarantees | 2018 | SIGMOD | 6.263585e-05 |
| 5,220 | Similarity Join Size Estimation using Locality Sensitive Hashing | 2011 | VLDB | 5.6216111e-05 |