| 1,187 |
JOSIE: Overlap Set Similarity Search for Finding Joinable Tables in Data Lakes |
2019 |
SIGMOD |
0.00013443639 |
| 1,277 |
The Data Civilizer System |
2017 |
CIDR |
0.00012879695 |
| 1,612 |
Detecting Data Errors: Where are we and what needs to be done? |
2016 |
VLDB |
0.00011142794 |
| 2,592 |
Pass-Join: A Partition-based Method for Similarity Joins |
2012 |
VLDB |
8.4795761e-05 |
| 3,225 |
DeltaPQ: Lossless Product Quantization Code Compression for High Dimensional Similarity Search |
2020 |
VLDB |
7.3463484e-05 |
| 3,624 |
SeRF: Segment Graph for Range-Filtering Approximate Nearest Neighbor Search |
2024 |
SIGMOD |
6.9056e-05 |
| 4,050 |
An Efficient Partition Based Method for Exact Set Similarity Joins |
2016 |
VLDB |
6.4953612e-05 |
| 4,353 |
Overlap Set Similarity Joins with Theoretical Guarantees |
2018 |
SIGMOD |
6.263585e-05 |
| 4,684 |
Approximate String Joins with Abbreviations |
2018 |
VLDB |
6.0006406e-05 |
| 5,058 |
A Demo of the Data Civilizer System |
2017 |
SIGMOD |
5.7280139e-05 |
| 5,073 |
Faerie: Efficient Filtering Algorithms for Approximate Dictionary-based Entity Extraction |
2011 |
SIGMOD |
5.7177424e-05 |
| 5,179 |
SilkMoth: An Efficient Method for Finding Related Sets with Maximum Matching Constraints |
2017 |
VLDB |
5.6428428e-05 |
| 5,362 |
Cost-Effective Crowdsourced Entity Resolution: A Partial-Order Approach |
2016 |
SIGMOD |
5.5473503e-05 |
| 5,474 |
Efficient Load-Balanced Butterfly Counting on GPU |
2022 |
VLDB |
5.4881807e-05 |
| 6,146 |
Distributed Graph Simulation: Impossibility and Possibility |
2014 |
VLDB |
5.1857597e-05 |
| 6,605 |
Dima: A Distributed In-Memory Similarity-Based Query Processing System |
2017 |
VLDB |
4.9965703e-05 |
| 6,726 |
A Pivotal Prefix Based Filtering Algorithm for String Similarity Search |
2014 |
SIGMOD |
4.9484027e-05 |
| 7,109 |
Efficient Similarity Join and Search on Multi-Attribute Data |
2015 |
SIGMOD |
4.8292998e-05 |
| 7,204 |
ARKGraph: All-Range Approximate K-Nearest-Neighbor Graph |
2023 |
VLDB |
4.8015761e-05 |
| 7,588 |
Scalable Column Concept Determination for Web Tables Using Large Knowledge Bases |
2013 |
VLDB |
4.7030914e-05 |
| 7,635 |
Allign: Aligning All-Pair Near-Duplicate Passages in Long Texts |
2021 |
SIGMOD |
4.6908858e-05 |
| 7,700 |
Near-Duplicate Text Alignment with One Permutation Hashing |
2024 |
SIGMOD |
4.6744372e-05 |
| 8,291 |
TxtAlign: Efficient Near-Duplicate Text Alignment Search via Bottom-k Sketches for Plagiarism Detection |
2022 |
SIGMOD |
4.5435639e-05 |
| 8,656 |
Dynamic Range-Filtering Approximate Nearest Neighbor Search |
2025 |
VLDB |
4.4737647e-05 |
| 9,567 |
META: An Efficient Matching-Based Method for Error-Tolerant Autocompletion |
2016 |
VLDB |
4.3254416e-05 |
| 9,832 |
Balance-Aware Distributed String Similarity-Based Query Processing System |
2019 |
VLDB |
4.2751057e-05 |
| 9,876 |
Near-Duplicate Sequence Search at Scale for Large Language Model Memorization Evaluation |
2023 |
SIGMOD |
4.2667743e-05 |
| 10,266 |
Near-Duplicate Text Alignment under Weighted Jaccard Similarity |
2026 |
VLDB |
4.1945683e-05 |
| 11,343 |
SPINE: Scaling up Programming-by-Negative-Example for String Filtering and Transformation |
2022 |
SIGMOD |
4.1945683e-05 |