Back to papers
Towards Automated Cross-domain Exploratory Data Analysis through Large Language Models
Summary: TiInsight: an end-to-end LLM-driven, cross-domain SQL EDA system that introduces hierarchical data context (HDC) to summarize schema and enable open-world generalization. Four-stage pipeline (HDC gen, question clarification/decomposition, TiSQL text-to-SQL, TiChart viz), production deployment with open APIs, achieves 86.3% exec accuracy on Spider (GPT‑4) and strong user-study gains vs experts.
(summarized by gpt-5-mini on Feb 09 2026)
- Paper ID
- 14109
- Venue
- VLDB
- Year
- 2025
- Pagerank
- 4.1945683e-05
- Overall Rank
- 10,784 | 24.98%
- DOI
-
10.14778/3750601.3750629
Incoming Non-self Citations Over Time
No non-self incoming citations found for this paper in this database.
Incoming Citations (Sorted by Pagerank)
Showing 0 of 0 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
Outgoing Citations (Sorted by Pagerank)
Showing 28 of 28 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 369 |
Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation |
2024 |
VLDB |
0.0002547515 |
| 460 |
SeeDB: Efficient Data-Driven Visualization Recommendations to Support Visual Analytics |
2015 |
VLDB |
0.00022516069 |
| 535 |
ATHENA: An Ontology-Driven System for Natural Language Querying over Relational Data Stores |
2016 |
VLDB |
0.00020727678 |
| 984 |
Natural language to SQL: Where are we today? |
2020 |
VLDB |
0.00014857465 |
| 998 |
CodeS: Towards Building Open-source Language Models for Text-to-SQL |
2024 |
SIGMOD |
0.00014729379 |
| 1,350 |
Northstar: An Interactive Data Science System |
2018 |
VLDB |
0.00012431059 |
| 1,430 |
Duoquest: A Dual-Specification System for Expressive SQL Queries |
2020 |
SIGMOD |
0.00012031061 |
| 1,552 |
Overview of Data Exploration Techniques |
2015 |
SIGMOD |
0.00011408814 |
| 2,945 |
Few-shot Text-to-SQL Translation using Structure and Content Prompt Learning |
2023 |
SIGMOD |
7.8377395e-05 |
| 2,988 |
NL2SQL is a solved problem... Not! |
2024 |
CIDR |
7.7761714e-05 |
| 3,393 |
Lux: Always-on Visualization Recommendations for Exploratory Dataframe Workflows |
2022 |
VLDB |
7.1483239e-05 |
| 3,546 |
Extracting Top-K Insights from Multi-dimensional Data |
2017 |
SIGMOD |
6.9870745e-05 |
| 3,661 |
Example-Driven Query Intent Discovery: Abductive Reasoning using Semantic Similarity |
2019 |
VLDB |
6.8689912e-05 |
| 3,970 |
HAIChart: Human and AI Paired Visualization System |
2024 |
VLDB |
6.5784767e-05 |
| 4,540 |
Automating Exploratory Data Analysis via Machine Learning: An Overview |
2020 |
SIGMOD |
6.1033443e-05 |
| 4,739 |
AutoTQA: Towards Autonomous Tabular Question Answering through Multi-Agent Large Language Models |
2024 |
VLDB |
5.959592e-05 |
| 4,908 |
Combining Small Language Models and Large Language Models for Zero-Shot NL2SQL |
2024 |
VLDB |
5.8339245e-05 |
| 5,033 |
FinSQL: Model-Agnostic LLMs-based Text-to-SQL Framework for Financial Analysis |
2024 |
SIGMOD |
5.7486224e-05 |
| 5,217 |
QuickInsights: Quick and Automatic Discovery of Insights from Multi-Dimensional Data |
2019 |
SIGMOD |
5.6227959e-05 |
| 5,313 |
XInsight: eXplainable Data Analysis Through The Lens of Causality |
2023 |
SIGMOD |
5.573009e-05 |
| 5,981 |
DataPrep.EDA: Task-Centric Exploratory Data Analysis for Statistical Modeling in Python |
2021 |
SIGMOD |
5.2448986e-05 |
| 7,989 |
RCRank: Multimodal Ranking of Root Causes of Slow Queries in Cloud Database Systems |
2025 |
VLDB |
4.6124681e-05 |
| 8,388 |
FEDEX: An Explainability Framework for Data Exploration Steps |
2022 |
VLDB |
4.5297787e-05 |
| 8,996 |
MetaInsight: Automatic Discovery of Structured Knowledge for Exploratory Data Analysis |
2021 |
SIGMOD |
4.4124959e-05 |
| 9,219 |
Intelligent Agents for Data Exploration |
2024 |
VLDB |
4.3702863e-05 |
| 9,829 |
Sevi: Speech-to-Visualization through Neural Machine Translation |
2022 |
SIGMOD |
4.2751057e-05 |
| 9,830 |
Towards Autonomous, Hands-Free Data Exploration |
2020 |
CIDR |
4.2751057e-05 |
| 10,460 |
UNITQA: A Unified Automated Tabular Question Answering System with Multi-Agent Large Language Models |
2025 |
SIGMOD |
4.1945683e-05 |
Semantically Similar Papers
| Overall Rank |
Paper |
Year |
Venue |
Pagerank |
| 10,268 |
OpenSQL: Data-Efficient Text-to-SQL for Open-Source LLMs via Synthesized Intermediate Supervision |
2026 |
VLDB |
4.1945683e-05 |
| 3,859 |
OpenSearch-SQL: Enhancing Text-to-SQL with Dynamic Few-shot and Consistency Alignment |
2025 |
SIGMOD |
6.6907933e-05 |
| 8,736 |
Unveiling Challenges for LLMs in Enterprise Data Engineering |
2026 |
VLDB |
4.456315e-05 |
| 7,354 |
Reliable Text-to-SQL with Adaptive Abstention |
2025 |
SIGMOD |
4.7529612e-05 |
| 10,221 |
NL2SQLBench: A Modular Benchmarking Framework for LLM-Enabled NL2SQL Solutions |
2026 |
VLDB |
4.1945683e-05 |
| 6,389 |
Chat2Data: An Interactive Data Analysis System with RAG, Vector Databases and LLMs |
2024 |
VLDB |
5.0844009e-05 |
| 10,897 |
Welding Natural Language Queries to Analytics IRs with LLMs |
2024 |
CIDR |
4.1945683e-05 |
| 8,155 |
Automated Data Visualization from Natural Language via Large Language Models: An Exploratory Study |
2024 |
SIGMOD |
4.5745248e-05 |
| 369 |
Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation |
2024 |
VLDB |
0.0002547515 |
| 4,739 |
AutoTQA: Towards Autonomous Tabular Question Answering through Multi-Agent Large Language Models |
2024 |
VLDB |
5.959592e-05 |