Database Paper Browser

Back to papers

Smurf: Self-Service String Matching Using Random Forests

Summary: Smurf enables self-service string matching with active learning, reducing labeling by 43–76% while maintaining F1. Its RDBMS-style plan optimization reuses computations across RF trees for two string sets, advancing self-service SM and scalable RF over structured data. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
11964
Venue
VLDB
Year
2019
Pagerank
6.2195162e-05
Overall Rank
4,402 | 69.38%
DOI
10.14778/3291264.3291272

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 8 of 8 citing papers.

Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 25 of 25 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
125 Approximate String Joins in a Database (Almost) for Free 2001 VLDB 0.00044847972
179 Efficient and Extensible Algorithms for Multi Query Optimization 2000 SIGMOD 0.00037672155
250 Efficient set joins on similarity predicates 2004 SIGMOD 0.00030661988
266 Efficient Exact Set-Similarity Joins 2006 VLDB 0.00029718727
447 Efficient Parallel Set-Similarity Joins Using MapReduce 2010 SIGMOD 0.00022900171
643 Corleone: Hands-Off Crowdsourcing for Entity Matching 2014 SIGMOD 0.00018754451
712 Magellan: Toward Building Entity Matching Management Systems 2016 VLDB 0.00017732426
834 Learning Linear Regression Models over Factorized Joins 2016 SIGMOD 0.00016135159
1,043 Adaptive Ordering of Pipelined Stream Filters 2004 SIGMOD 0.00014476247
1,107 SPRINT: A Scalable Parallel Classifier for Data Mining 1996 VLDB 0.00013985717
1,167 Learning Generalized Linear Models Over Normalized Data 2015 SIGMOD 0.00013547713
1,476 Efficient Exploitation of Similar Subexpressions for Query Processing 2007 SIGMOD 0.00011779092
1,715 V-SMART-Join: A Scalable MapReduce Framework for All-Pair Similarity Joins of Multisets and Vectors 2012 VLDB 0.00010803271
2,175 Falcon: Scaling Up Hands-Off Crowdsourced Entity Matching to Build Cloud Services 2017 SIGMOD 9.3644117e-05
2,376 Bed-Tree: An All-Purpose Index Structure for String Similarity Search Based on Edit Distance 2010 SIGMOD 8.9424361e-05
2,630 PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce 2009 VLDB 8.4128091e-05
2,740 String Similarity Joins: An Experimental Evaluation 2014 VLDB 8.1980628e-05
3,141 ClusterJoin: A Similarity Joins Framework using Map-Reduce 2014 VLDB 7.4829448e-05
3,459 An Empirical Evaluation of Set Similarity Join Techniques 2016 VLDB 7.072508e-05
4,353 Overlap Set Similarity Joins with Theoretical Guarantees 2018 SIGMOD 6.263585e-05
4,684 Approximate String Joins with Abbreviations 2018 VLDB 6.0006406e-05
6,605 Dima: A Distributed In-Memory Similarity-Based Query Processing System 2017 VLDB 4.9965703e-05
7,109 Efficient Similarity Join and Search on Multi-Attribute Data 2015 SIGMOD 4.8292998e-05
9,439 On-the-Fly Token Similarity Joins in Relational Databases 2014 SIGMOD 4.3423824e-05
11,739 CloudMatcher: A Hands-Off Cloud/Crowd Service for Entity Matching 2018 VLDB 4.1945683e-05
Previous Page 1 / 1 Next

Semantically Similar Papers