Database Paper Browser

Back to papers

The Case For Language Model Approximated LIKE Predicate

Summary: SMILE reframes wildcard LIKE as neural pattern decoding: a compact column-local language model translates complex LIKE predicates into small candidate sets, then verifies via hash lookups. Yields asymptotic, dataset-size-invariant evaluation, robust to drift, outperforming trigram/B+-tree indexes by large margins. (summarized by gpt-5.4-mini on Apr 11 2026)

Paper ID
7528
Venue
SIGMOD
Year
2026
Pagerank
4.1945683e-05
Overall Rank
10,216 | 28.93%
DOI
10.1145/3786703

Incoming Non-self Citations Over Time

No non-self incoming citations found for this paper in this database.

Authors

Incoming Citations (Sorted by Pagerank)

Showing 0 of 0 citing papers.

Rank Citing Paper Year Venue Pagerank
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 43 of 43 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
102 The Case for Learned Index Structures 2018 SIGMOD 0.00049545203
112 Potter's Wheel: An Interactive Data Cleaning System 2001 VLDB 0.00047045036
125 Approximate String Joins in a Database (Almost) for Free 2001 VLDB 0.00044847972
155 Robust and Efficient Fuzzy Match for Online Data Cleaning 2003 SIGMOD 0.00040637896
204 Learned Cardinalities: Estimating Correlated Joins with Deep Learning 2019 CIDR 0.00034784455
405 Approximate Query Processing Using Wavelets 2000 VLDB 0.00024057494
758 Deep Unsupervised Cardinality Estimation 2020 VLDB 0.0001706608
805 Evaluating Top-k Selection Queries 1999 VLDB 0.00016437265
910 NeuroCard: One Cardinality Estimator for All Tables 2021 VLDB 0.00015423056
1,012 NADEEF: A Commodity Data Cleaning System 2013 SIGMOD 0.0001464733
1,202 VGRAM: Improving Performance of Approximate Queries on String Collections Using Variable-Length Grams 2007 VLDB 0.00013326298
1,262 RankSQL: Query Algebra and Optimization for Relational Top-k Queries 2005 SIGMOD 0.00012986539
1,375 FITing-Tree: A Data-aware Index Structure 2019 SIGMOD 0.00012303141
1,638 Cardinality Estimation in DBMS: A Comprehensive Benchmark Evaluation 2022 VLDB 0.00011049779
2,009 IO-Top-k: Index-access Optimized Top-k Query Processing 2006 VLDB 9.7977564e-05
2,040 A Study of the Fundamental Performance Characteristics of GPUs and CPUs for Database Analytics 2020 SIGMOD 9.7057698e-05
2,073 Extending Autocompletion To Tolerate Errors 2009 SIGMOD 9.6142791e-05
2,193 Cost-Based Variable-Length-Gram Selection for String Collections to Support Approximate Queries Efficiently 2008 SIGMOD 9.3178557e-05
2,552 Updatable Learned Index with Precise Positions 2021 VLDB 8.5530411e-05
2,823 Interaction between Record Matching and Data Repairing 2011 SIGMOD 8.0593894e-05
3,611 SNARF: A Learning-Enhanced Range Filter 2022 VLDB 6.9191399e-05
3,944 AQP++: Connecting Approximate Query Processing With Aggregate Precomputation for Interactive Analytics 2018 SIGMOD 6.6078243e-05
4,097 The Case for a Learned Sorting Algorithm 2020 SIGMOD 6.4551616e-05
4,359 Astrid: Accurate Selectivity Estimation for String Predicates using Deep Learning 2021 VLDB 6.2569955e-05
4,438 Selectivity Estimation for Fuzzy String Predicates in Large Data Sets 2005 VLDB 6.1898903e-05
4,593 Auto-WLM: Machine Learning Enhanced Workload Management in Amazon Redshift 2023 SIGMOD 6.0606891e-05
4,908 Combining Small Language Models and Large Language Models for Zero-Shot NL2SQL 2024 VLDB 5.8339245e-05
5,073 Faerie: Efficient Filtering Algorithms for Approximate Dictionary-based Entity Extraction 2011 SIGMOD 5.7177424e-05
5,192 Pattern Functional Dependencies for Data Cleaning 2020 VLDB 5.6375087e-05
5,314 Can Learned Models Replace Hash Functions? 2023 VLDB 5.5724608e-05
5,401 ALECE: An Attention-based Learned Cardinality Estimator for SPJ Queries on Dynamic Workloads 2024 VLDB 5.5285035e-05
5,832 Stage: Query Execution Time Prediction in Amazon Redshift 2024 SIGMOD 5.3111109e-05
7,048 Magneto: Combining Small and Large Language Models for Schema Matching 2025 VLDB 4.8520651e-05
7,186 LPLM: A Neural Language Model for Cardinality Estimation of LIKE-Queries 2024 SIGMOD 4.8063731e-05
7,575 Human-in-the-loop Outlier Detection 2020 SIGMOD 4.7068909e-05
7,668 Human-in-the-loop Data Integration 2017 VLDB 4.6834075e-05
7,708 Efficient Top-k Algorithms for Approximate Substring Matching 2013 SIGMOD 4.6721808e-05
7,894 LITS: An Optimized Learned Index for Strings 2024 VLDB 4.6240341e-05
8,442 SageDB: An Instance-Optimized Data Analytics System 2022 VLDB 4.5120602e-05
8,948 One Seed, Two Birds: A Unified Learned Structure for Exact and Approximate Counting 2024 SIGMOD 4.423786e-05
9,301 Repairing Data through Regular Expressions 2016 VLDB 4.3587281e-05
9,726 Cardinality Estimation of LIKE Predicate Queries using Deep Learning 2025 SIGMOD 4.2943379e-05
9,943 Stop Word and Related Problems in Web Interface Integration 2009 VLDB 4.2456408e-05
Previous Page 1 / 1 Next

Semantically Similar Papers