Pneuma: Leveraging LLMs for Tabular Data Representation and Retrieval in an End-to-End System

Summary: Pneuma is an end-to-end RAG system using LLMs to represent and retrieve tabular data, preserving schema and row context for accurate discovery. Evaluated on six real-world datasets, it outperforms full-text search and state-of-the-art RAG in accuracy and efficiency. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID: 7257
Venue: SIGMOD
Year: 2025
Pagerank: 5.3387063e-05
Overall Rank: 5,756 | 60.00%
DOI: 10.1145/3725337

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 6 of 6 citing papers.

Rank	Citing Paper	Year	Venue	Pagerank
9,990	The Pneuma Project: Reifying Information Needs as Relational Schemas to Automate Discovery, Guide Preparation, and Align Data with Intent	2026	CIDR	4.1905499e-05
10,142	AutoDDG: Automated Dataset Description Generation using Large Language Models	2026	SIGMOD	4.1905499e-05
10,215	Task Cascades for Efficient Unstructured Data Processing	2026	SIGMOD	4.1905499e-05
10,285	Relational Deep Dive: Error-Aware Queries Over Unstructured Data	2026	VLDB	4.1905499e-05
10,332	ELT-Bench: An End-to-End Benchmark for Evaluating AI Agents on ELT Pipelines	2026	VLDB	4.1905499e-05
10,341	Revisiting Task-Oriented Dataset Search in the Era of Large Language Models: Challenges, Benchmark, and Solution	2026	VLDB	4.1905499e-05

Outgoing Citations (Sorted by Pagerank)

Showing 8 of 8 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank	Cited Paper	Year	Venue	Pagerank
1,742	Auctus: A Dataset Search Engine for Data Discovery and Augmentation	2021	VLDB	0.00010695388
1,866	ReAcTable: Enhancing ReAct for Table Question Answering	2024	VLDB	0.00010265592
2,013	Palimpzest: Optimizing AI-Powered Analytics with Declarative Query Processing	2025	CIDR	9.7986166e-05
2,267	Ground: A Data Context Service	2017	CIDR	9.1554363e-05
3,189	Text2SQL is Not Enough: Unifying AI and Databases with TAG	2025	CIDR	7.4140094e-05
3,639	The Design of an LLM-powered Unstructured Analytics System	2025	CIDR	6.8886648e-05
4,970	Leva: Boosting Machine Learning Performance with Relational Embedding Data Augmentation	2022	SIGMOD	5.7900867e-05
7,869	Solo: Data Discovery Using Natural Language Questions Via A Self-Supervised Approach	2023	SIGMOD	4.6275089e-05

Semantically Similar Papers

Overall Rank	Paper	Year	Venue	Pagerank
10,434	Andromeda: Debugging Database Performance Issues with Retrieval-Augmented Large Language Models	2025	SIGMOD	4.1905499e-05
10,285	Relational Deep Dive: Error-Aware Queries Over Unstructured Data	2026	VLDB	4.1905499e-05
10,210	SchemaRAG: A Schema-aware Retrieval-Augmented Generation Framework for Text-to-SQL	2026	SIGMOD	4.1905499e-05
9,405	TabulaX: Leveraging Large Language Models for Multi-Class Table Transformations	2025	VLDB	4.3399748e-05
4,735	AutoTQA: Towards Autonomous Tabular Question Answering through Multi-Agent Large Language Models	2024	VLDB	5.9538651e-05
10,828	TableCopilot: A Table Assistant Empowered by Natural Language Conditional Table Discovery	2025	VLDB	4.1905499e-05
6,202	Chat2Data: An Interactive Data Analysis System with RAG, Vector Databases and LLMs	2024	VLDB	5.1554849e-05
10,976	Unstructured Data Fusion for Schema and Data Extraction	2024	SIGMOD	4.1905499e-05
7,016	LLM for Data Management	2024	VLDB	4.8561622e-05
9,990	The Pneuma Project: Reifying Information Needs as Relational Schemas to Automate Discovery, Guide Preparation, and Align Data with Intent	2026	CIDR	4.1905499e-05