Pneuma: Leveraging LLMs for Tabular Data Representation and Retrieval in an End-to-End System
Summary: Pneuma is an end-to-end RAG system using LLMs to represent and retrieve tabular data, preserving schema and row context for accurate discovery. Evaluated on six real-world datasets, it outperforms full-text search and state-of-the-art RAG in accuracy and efficiency. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
Incoming Citations (Sorted by Pagerank)
Showing 5 of 5 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 9,991 | The Pneuma Project: Reifying Information Needs as Relational Schemas to Automate Discovery, Guide Preparation, and Align Data with Intent | 2026 | CIDR | 4.1945683e-05 |
| 10,142 | AutoDDG: Automated Dataset Description Generation using Large Language Models | 2026 | SIGMOD | 4.1945683e-05 |
| 10,215 | Task Cascades for Efficient Unstructured Data Processing | 2026 | SIGMOD | 4.1945683e-05 |
| 10,320 | ELT-Bench: An End-to-End Benchmark for Evaluating AI Agents on ELT Pipelines | 2026 | VLDB | 4.1945683e-05 |
| 10,329 | Revisiting Task-Oriented Dataset Search in the Era of Large Language Models: Challenges, Benchmark, and Solution | 2026 | VLDB | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 8 of 8 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 1,751 | Auctus: A Dataset Search Engine for Data Discovery and Augmentation | 2021 | VLDB | 0.00010683295 |
| 1,872 | ReAcTable: Enhancing ReAct for Table Question Answering | 2024 | VLDB | 0.00010259702 |
| 2,106 | Palimpzest: Optimizing AI-Powered Analytics with Declarative Query Processing | 2025 | CIDR | 9.5342543e-05 |
| 2,269 | Ground: A Data Context Service | 2017 | CIDR | 9.147379e-05 |
| 3,359 | Text2SQL is Not Enough: Unifying AI and Databases with TAG | 2025 | CIDR | 7.1744146e-05 |
| 3,876 | The Design of an LLM-powered Unstructured Analytics System | 2025 | CIDR | 6.6741456e-05 |
| 4,967 | Leva: Boosting Machine Learning Performance with Relational Embedding Data Augmentation | 2022 | SIGMOD | 5.7956612e-05 |
| 7,868 | Solo: Data Discovery Using Natural Language Questions Via A Self-Supervised Approach | 2023 | SIGMOD | 4.6319504e-05 |
Previous
Page 1 / 1
Next