Set Similarity Join on Probabilistic Data
Summary: Models probabilistic set data at set- and element-level uncertainty and defines probabilistic set similarity join (PS2J) under possible worlds semantics. Introduces world condensation and pruning techniques—Jaccard distance, probability upper-bound, and aggregate pruning—with indexing and synopses, validated by extensive experiments. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Xiang Lian
- 2. Lei Chen
Incoming Citations (Sorted by Pagerank)
Showing 2 of 2 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 2,740 | String Similarity Joins: An Experimental Evaluation | 2014 | VLDB | 8.1980628e-05 |
| 11,904 | Indexing Metric Uncertain Data for Range Queries | 2015 | SIGMOD | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 11 of 11 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 91 | M-tree: An Efficient Access Method for Similarity Search in Metric Spaces | 1997 | VLDB | 0.0005181666 |
| 101 | ULDBs: Databases with Uncertainty and Lineage | 2006 | VLDB | 0.0004955674 |
| 250 | Efficient set joins on similarity predicates | 2004 | SIGMOD | 0.00030661988 |
| 266 | Efficient Exact Set-Similarity Joins | 2006 | VLDB | 0.00029718727 |
| 321 | MCDB: A Monte Carlo Approach to Managing Uncertain Data | 2008 | SIGMOD | 0.00027527389 |
| 706 | MYSTIQ: A system for finding more answers by using probabilities | 2005 | SIGMOD | 0.00017845469 |
| 721 | Data Integration with Uncertainty | 2007 | VLDB | 0.00017570539 |
| 760 | Creating Probabilistic Databases from Information Extraction Models | 2006 | VLDB | 0.00017053935 |
| 980 | BayesStore: Managing Large, Uncertain Data Repositories with Probabilistic Graphical Models | 2008 | VLDB | 0.00014879747 |
| 1,705 | U-DBMS: A Database System for Managing Constantly-Evolving Data | 2005 | VLDB | 0.00010829958 |
| 4,901 | Probabilistic String Similarity Joins | 2010 | SIGMOD | 5.8411648e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 6,705 | Consistent Query Answers in Inconsistent Probabilistic Databases | 2010 | SIGMOD | 4.9549359e-05 |
| 4,353 | Overlap Set Similarity Joins with Theoretical Guarantees | 2018 | SIGMOD | 6.263585e-05 |
| 11,305 | TokenJoin: Efficient Filtering for Set Similarity Join with Maximum Weighted Bipartite Matching | 2023 | VLDB | 4.1945683e-05 |
| 3,490 | Leveraging Set Relations in Exact Set Similarity Join | 2017 | VLDB | 7.0465856e-05 |
| 4,050 | An Efficient Partition Based Method for Exact Set Similarity Joins | 2016 | VLDB | 6.4953612e-05 |
| 250 | Efficient set joins on similarity predicates | 2004 | SIGMOD | 0.00030661988 |
| 3,459 | An Empirical Evaluation of Set Similarity Join Techniques | 2016 | VLDB | 7.072508e-05 |
| 4,901 | Probabilistic String Similarity Joins | 2010 | SIGMOD | 5.8411648e-05 |
| 11,979 | Similarity Joins for Uncertain Strings | 2014 | SIGMOD | 4.1945683e-05 |
| 266 | Efficient Exact Set-Similarity Joins | 2006 | VLDB | 0.00029718727 |