Back to papers
TASTI: Semantic Indexes for Machine Learning-based Queries over Unstructured Data
Summary: Proposes TASTI, a trainable semantic index replacing per-query proxies with embeddings so similar records share outputs. Theoretically ties embedding error to accuracy; empirically on five multimodal datasets, it builds 10x cheaper indexes and 24x faster proxy queries.
(summarized by gpt-5-nano on Feb 09 2026)
- Paper ID
- 6349
- Venue
- SIGMOD
- Year
- 2022
- Pagerank
- 6.137686e-05
- Overall Rank
- 4,501 | 68.69%
- DOI
-
10.1145/3514221.3517897
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 15 of 15 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
| 4,567 |
Optimizing Video Analytics with Declarative Model Relationships |
2023 |
VLDB |
6.080526e-05 |
| 5,214 |
ThalamusDB: Approximate Query Processing on Multi-Modal Data |
2024 |
SIGMOD |
5.624434e-05 |
| 5,658 |
Databases Unbound: Querying All of the World's Bytes with AI |
2024 |
VLDB |
5.385675e-05 |
| 6,877 |
Extract-Transform-Load for Video Streams |
2023 |
VLDB |
4.8974054e-05 |
| 7,338 |
Aero: Adaptive Query Processing of ML Queries |
2025 |
SIGMOD |
4.7584583e-05 |
| 7,928 |
Accelerating Aggregation Queries on Unstructured Streams of Data |
2023 |
VLDB |
4.613455e-05 |
| 8,383 |
EQUI-VOCAL: Synthesizing Queries for Compositional Video Events from Limited User Interactions |
2023 |
VLDB |
4.5307128e-05 |
| 9,765 |
TVM: A Tile-based Video Management Framework |
2024 |
VLDB |
4.2856106e-05 |
| 10,064 |
Cut Costs, Not Accuracy: LLM-Powered Data Processing with Guarantees |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,215 |
Task Cascades for Efficient Unstructured Data Processing |
2026 |
SIGMOD |
4.1945683e-05 |
| 10,382 |
MAST: Towards Efficient Analytical Query Processing on Point Cloud Data |
2025 |
SIGMOD |
4.1945683e-05 |
| 10,503 |
Self-Enhancing Video Data Management System for Compositional Events with Large Language Models |
2025 |
SIGMOD |
4.1945683e-05 |
| 10,523 |
Scalable Complex Event Processing on Video Streams |
2025 |
SIGMOD |
4.1945683e-05 |
| 10,944 |
Predictive and Near-Optimal Sampling for View Materialization in Video Databases |
2024 |
SIGMOD |
4.1945683e-05 |
| 11,061 |
Optimizing Video Queries with Declarative Clues |
2024 |
VLDB |
4.1945683e-05 |
Outgoing Citations (Sorted by Pagerank)
Showing 16 of 16 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 5,072 |
Optimizing Machine Learning Inference Queries with Correlative Proxy Models |
2022 |
VLDB |
5.7185674e-05 |
| 10,204 |
Reveal Hidden Pitfalls and Navigate Next Generation of Vector Similarity Search from Task-Centric Views: [Experiments & Analysis] |
2026 |
SIGMOD |
4.1945683e-05 |
| 4,092 |
Structured Annotations of Web Queries |
2010 |
SIGMOD |
6.4561959e-05 |
| 10,752 |
QUEST: Query Optimization in Unstructured Document Analysis |
2025 |
VLDB |
4.1945683e-05 |
| 10,288 |
TATA: An Efficient Framework for Task Transfer in Query Plan Representation |
2026 |
VLDB |
4.1945683e-05 |
| 8,672 |
Optimizing Video Selection LIMIT Queries With Commonsense Knowledge |
2024 |
VLDB |
4.4710897e-05 |
| 6,082 |
Query-Sensitive Embeddings |
2005 |
SIGMOD |
5.2205711e-05 |
| 3,604 |
Spatial and Temporal Constrained Ranked Retrieval over Videos |
2022 |
VLDB |
6.9301368e-05 |
| 7,928 |
Accelerating Aggregation Queries on Unstructured Streams of Data |
2023 |
VLDB |
4.613455e-05 |
| 11,426 |
Accelerating Queries over Unstructured Data with ML |
2021 |
CIDR |
4.1945683e-05 |