Database Paper Browser

Back to papers

Astrid: Accurate Selectivity Estimation for String Predicates using Deep Learning

Summary: Astrid blends traditional pruning sketches with deep learning to estimate string selectivity for prefix, substring, and suffix queries. It offers a query-type aware embedding and a revised neural language-model objective with an efficient optimizer, achieving state-of-the-art results on benchmarks. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
12567
Venue
VLDB
Year
2021
Pagerank
6.2569955e-05
Overall Rank
4,359 | 69.68%
DOI
10.14778/3436905.3436907

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 16 of 16 citing papers.

Rank Citing Paper Year Venue Pagerank
1,638 Cardinality Estimation in DBMS: A Comprehensive Benchmark Evaluation 2022 VLDB 0.00011049779
2,121 Balsa: Learning a Query Optimizer Without Expert Demonstrations 2022 SIGMOD 9.5017232e-05
3,169 QueryFormer: A Tree Transformer Model for Query Plan Representation 2022 VLDB 7.4498425e-05
3,266 Learned Cardinality Estimation: An In-depth Study 2022 SIGMOD 7.3074684e-05
4,417 Robust Query Driven Cardinality Estimation under Changing Workloads 2023 VLDB 6.2037371e-05
6,328 A Comparative Study and Component Analysis of Query Plan Representation Techniques in ML4DB Studies 2024 VLDB 5.1082882e-05
7,011 Simple Adaptive Query Processing vs. Learned Query Optimizers: Observations and Analysis 2023 VLDB 4.8629458e-05
7,126 Debunking the Myth of Join Ordering: Toward Robust SQL Analytics 2025 SIGMOD 4.8232367e-05
7,186 LPLM: A Neural Language Model for Cardinality Estimation of LIKE-Queries 2024 SIGMOD 4.8063731e-05
7,474 Cardinality Estimation of Approximate Substring Queries using Deep Learning 2022 VLDB 4.7194345e-05
9,726 Cardinality Estimation of LIKE Predicate Queries using Deep Learning 2025 SIGMOD 4.2943379e-05
9,728 SPACE: Cardinality Estimation for Path Queries Using Cardinality-Aware Sequence-based Learning 2025 SIGMOD 4.2942813e-05
9,945 SSCard: Substring Cardinality Estimation using Suffix Tree-Guided Learned FM-Index 2026 SIGMOD 4.2432653e-05
9,960 An Elephant Under The Microscope: Analyzing The Interaction Of Optimizer Components In PostgreSQL 2025 SIGMOD 4.2294678e-05
10,216 The Case For Language Model Approximated LIKE Predicate 2026 SIGMOD 4.1945683e-05
10,219 Practical Parameterized Query Optimization via Efficient Plan Reuse and List-wise Ranking 2026 SIGMOD 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 33 of 33 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
64 Improved Histograms for Selectivity Estimation of Range Predicates 1996 SIGMOD 0.00063612837
71 How Good Are Query Optimizers, Really? 2016 VLDB 0.00059038975
92 Practical Selectivity Estimation through Adaptive Sampling 1990 SIGMOD 0.00051315959
102 The Case for Learned Index Structures 2018 SIGMOD 0.00049545203
204 Learned Cardinalities: Estimating Correlated Joins with Deep Learning 2019 CIDR 0.00034784455
222 Wavelet-Based Histograms for Selectivity Estimation 1998 SIGMOD 0.00032828302
260 Fast Exact Shortest-Path Distance Queries on Large Networks by Pruned Landmark Labeling 2013 SIGMOD 0.00030040036
325 The History of Histograms (abridged) 2003 VLDB 0.00027378328
326 Optimal Histograms with Quality Guarantees 1998 VLDB 0.00027358981
333 Neo: A Learned Query Optimizer 2019 VLDB 0.00027206884
372 Selectivity Estimation using Probabilistic Models 2001 SIGMOD 0.00025354779
512 STHoles: A Multidimensional Workload-Aware Histogram 2001 SIGMOD 0.00021380733
608 DeepDB: Learn from Data, not from Queries! 2020 VLDB 0.00019235898
754 Distributed Representations of Tuples for Entity Resolution 2018 VLDB 0.00017117211
758 Deep Unsupervised Cardinality Estimation 2020 VLDB 0.0001706608
801 SageDB: A Learned Database System 2019 CIDR 0.00016505496
806 An End-to-End Learning-based Cost Estimator 2020 VLDB 0.00016434274
852 Dynamic Multidimensional Histograms 2002 SIGMOD 0.00015941524
1,105 Cardinality Estimation Done Right: Index-Based Join Sampling 2017 CIDR 0.00013990395
1,146 Estimating Alphanumeric Selectivity in the Presence of Wildcards 1996 SIGMOD 0.00013679782
1,202 VGRAM: Improving Performance of Approximate Queries on String Collections Using Variable-Length Grams 2007 VLDB 0.00013326298
1,254 Selectivity Estimation for Range Predicates using Lightweight Models 2019 VLDB 0.00013027411
1,379 Substring Selectivity Estimation 1999 PODS 0.00012286879
1,547 Lightweight Graphical Models for Selectivity Estimation Without Independence Assumptions 2011 VLDB 0.00011442359
1,914 Creating Embeddings of Heterogeneous Relational Datasets for Data Integration Tasks 2020 SIGMOD 0.00010109102
1,981 Improved Selectivity Estimation by Combining Knowledge from Sampling and Synopses 2018 VLDB 9.8687545e-05
2,171 Selectivity Estimation For Boolean Queries 2000 PODS 9.3807165e-05
2,193 Cost-Based Variable-Length-Gram Selection for String Collections to Support Approximate Queries Efficiently 2008 SIGMOD 9.3178557e-05
2,364 Deep Learning Models for Selectivity Estimation of Multi-Attribute Queries 2020 SIGMOD 8.9554751e-05
2,841 Selectivity Estimation in Extensible Databases - A Neural Network Approach 1998 VLDB 8.0287389e-05
2,969 Estimating Join Selectivities using Bandwidth-Optimized Kernel Density Models 2017 VLDB 7.7974762e-05
3,226 Extending Q-Grams to Estimate Selectivity of String Matching with Low Edit Distance 2007 VLDB 7.3433307e-05
3,651 Conditional Selectivity for Statistics on Query Expressions 2004 SIGMOD 6.8768678e-05
Previous Page 1 / 1 Next

Semantically Similar Papers