Back to papers
Balance-Aware Distributed String Similarity-Based Query Processing System
Summary: Dima: distributed in-memory Spark system for similarity queries; supports similarity select, join, and top-k. Balance-aware signatures with global/local indexes balance load and accelerate queries; four real datasets show 1–3 orders of magnitude speedups.
(summarized by gpt-5-nano on Feb 09 2026)
- Paper ID
- 12018
- Venue
- VLDB
- Year
- 2019
- Pagerank
- 4.2751057e-05
- Overall Rank
- 9,832 | 31.61%
- DOI
-
10.14778/3329772.3329774
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 2 of 2 citing papers.
Outgoing Citations (Sorted by Pagerank)
Showing 23 of 23 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 125 |
Approximate String Joins in a Database (Almost) for Free |
2001 |
VLDB |
0.00044847972 |
| 155 |
Robust and Efficient Fuzzy Match for Online Data Cleaning |
2003 |
SIGMOD |
0.00040637896 |
| 250 |
Efficient set joins on similarity predicates |
2004 |
SIGMOD |
0.00030661988 |
| 266 |
Efficient Exact Set-Similarity Joins |
2006 |
VLDB |
0.00029718727 |
| 447 |
Efficient Parallel Set-Similarity Joins Using MapReduce |
2010 |
SIGMOD |
0.00022900171 |
| 712 |
Magellan: Toward Building Entity Matching Management Systems |
2016 |
VLDB |
0.00017732426 |
| 1,202 |
VGRAM: Improving Performance of Approximate Queries on String Collections Using Variable-Length Grams |
2007 |
VLDB |
0.00013326298 |
| 1,234 |
Ed-Join: An Efficient Algorithm for Similarity Joins With Edit Distance Constraints |
2008 |
VLDB |
0.00013122499 |
| 1,305 |
Bayesian Locality Sensitive Hashing for Fast Similarity Search |
2012 |
VLDB |
0.00012687101 |
| 1,396 |
Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search |
2012 |
SIGMOD |
0.00012204748 |
| 1,715 |
V-SMART-Join: A Scalable MapReduce Framework for All-Pair Similarity Joins of Multisets and Vectors |
2012 |
VLDB |
0.00010803271 |
| 2,024 |
ATLAS: A Probabilistic Algorithm for High Dimensional Similarity Search |
2011 |
SIGMOD |
9.7519678e-05 |
| 2,175 |
Falcon: Scaling Up Hands-Off Crowdsourced Entity Matching to Build Cloud Services |
2017 |
SIGMOD |
9.3644117e-05 |
| 2,376 |
Bed-Tree: An All-Purpose Index Structure for String Similarity Search Based on Edit Distance |
2010 |
SIGMOD |
8.9424361e-05 |
| 2,592 |
Pass-Join: A Partition-based Method for Similarity Joins |
2012 |
VLDB |
8.4795761e-05 |
| 2,740 |
String Similarity Joins: An Experimental Evaluation |
2014 |
VLDB |
8.1980628e-05 |
| 4,050 |
An Efficient Partition Based Method for Exact Set Similarity Joins |
2016 |
VLDB |
6.4953612e-05 |
| 4,353 |
Overlap Set Similarity Joins with Theoretical Guarantees |
2018 |
SIGMOD |
6.263585e-05 |
| 4,988 |
Incremental Maintenance of Length Normalized Indexes for Approximate String Matching |
2009 |
SIGMOD |
5.783959e-05 |
| 6,605 |
Dima: A Distributed In-Memory Similarity-Based Query Processing System |
2017 |
VLDB |
4.9965703e-05 |
| 6,726 |
A Pivotal Prefix Based Filtering Algorithm for String Similarity Search |
2014 |
SIGMOD |
4.9484027e-05 |
| 7,109 |
Efficient Similarity Join and Search on Multi-Attribute Data |
2015 |
SIGMOD |
4.8292998e-05 |
| 7,588 |
Scalable Column Concept Determination for Web Tables Using Large Knowledge Bases |
2013 |
VLDB |
4.7030914e-05 |
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 4,650 |
LocationSpark: A Distributed In-Memory Data Management System for Big Spatial Data |
2016 |
VLDB |
6.0234336e-05 |
| 10,930 |
Similarity Joins of Sparse Features |
2024 |
SIGMOD |
4.1945683e-05 |
| 7,250 |
A Scalable and Generic Approach to Range Joins |
2022 |
VLDB |
4.78908e-05 |
| 1,776 |
Distributed Trajectory Similarity Search |
2017 |
VLDB |
0.00010593716 |
| 12,247 |
SimDB: A Similarity-aware Database System |
2010 |
SIGMOD |
4.1945683e-05 |
| 13,473 |
Exploiting Database Similarity Joins for Metric Spaces |
2012 |
VLDB |
- |
| 1,435 |
Simba: Efficient In-Memory Spatial Analytics |
2016 |
SIGMOD |
0.00012004456 |
| 2,192 |
DITA: Distributed In-Memory Trajectory Analytics |
2018 |
SIGMOD |
9.3185895e-05 |
| 11,716 |
DITA: A Distributed In-Memory Trajectory Analytics System |
2018 |
SIGMOD |
4.1945683e-05 |
| 6,605 |
Dima: A Distributed In-Memory Similarity-Based Query Processing System |
2017 |
VLDB |
4.9965703e-05 |