Database Paper Browser

Back to papers

QueryArtisan: Generating Data Manipulation Codes for Ad-hoc Analysis in Data Lakes

Summary: LLM-driven QueryArtisan generates just-in-time data-manipulation code to enable natural-language ad-hoc queries directly over heterogeneous, schema-less data lakes using modality-aware operators. Integrates a cost-model optimizer to produce efficient operator plans, avoiding ETL/schemas and outperforming prior LLM and ETL approaches. (summarized by gpt-5-mini on Feb 09 2026)

Paper ID
13780
Venue
VLDB
Year
2025
Pagerank
4.2294678e-05
Overall Rank
9,961 | 30.71%
DOI
10.14778/3705829.3705832

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 2 of 2 citing papers.

Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 17 of 17 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
369 Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation 2024 VLDB 0.0002547515
513 TURL: Table Understanding through Representation Learning 2021 VLDB 0.00021288342
610 Goods: Organizing Google's Datasets 2016 SIGMOD 0.00019232674
939 Data Lake Management: Challenges and Opportunities 2019 VLDB 0.00015187344
1,178 Table Union Search on Open Data 2018 VLDB 0.00013468118
1,187 JOSIE: Overlap Set Similarity Search for Finding Joinable Tables in Data Lakes 2019 SIGMOD 0.00013443639
1,277 The Data Civilizer System 2017 CIDR 0.00012879695
1,643 CodexDB: Synthesizing Code for Query Processing from Natural Language Instructions using GPT-3 Codex 2022 VLDB 0.0001104256
1,664 On Multi-Column Foreign Key Discovery 2010 VLDB 0.00010976887
2,836 Semantics-aware Dataset Discovery from Data Lakes with Contextualized Column-based Representation Learning 2023 VLDB 8.0443826e-05
3,281 Constance: An Intelligent Data Lake System 2016 SIGMOD 7.2823287e-05
3,908 Progressive and Selective Merge: Computing Top-K with Ad-hoc Ranking Functions 2007 SIGMOD 6.6392878e-05
3,942 Ember: No-Code Context Enrichment via Similarity-Based Keyless Joins 2022 VLDB 6.6114622e-05
4,859 Integrating Data Lake Tables 2023 VLDB 5.8732433e-05
4,958 Efficient Subgraph Search over Large Uncertain Graphs 2011 VLDB 5.8031038e-05
6,165 When the Web is your Data Lake: Creating a Search Engine for Datasets on the Web 2020 SIGMOD 5.1728052e-05
7,643 Cross Modal Data Discovery over Structured and Unstructured Data Lakes 2023 VLDB 4.6901105e-05
Previous Page 1 / 1 Next

Semantically Similar Papers