The Design of an LLM-powered Unstructured Analytics System
Summary: Aryn compiles NL queries into semantic plans executed by Sycamore, a distributed declarative engine exposing DocSets to analyze, enrich, and transform large unstructured document collections. Luna (NL→Sycamore) and DocParse (PDF→DocSet) improve accuracy over RAG on NTSB reports and surface explainable execution traces to build trust. (summarized by gpt-5-mini on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Eric Anderson
- 2. Jonathan Fritz
- 3. Austin Lee
- 4. Bohou Li
- 5. Mark Lindblad
- 6. Henry Lindeman
- 7. Alex Meyer
- 8. Parthkumar Parmar
- 9. Tanvi Ranade
- 10. Mehul A. Shah
- 11. Benjamin Sowell
- 12. Dan Tecuci
- 13. Vinayak Thapliyal
- 14. Matt Welsh
Incoming Citations (Sorted by Pagerank)
Showing 12 of 12 citing papers.
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 9 of 9 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 221 | Deep Entity Matching with Pre-Trained Language Models | 2021 | VLDB | 0.00033121824 |
| 517 | Can Foundation Models Wrangle Your Data? | 2023 | VLDB | 0.00021169035 |
| 984 | Natural language to SQL: Where are we today? | 2020 | VLDB | 0.00014857465 |
| 1,082 | CAESURA: Language Models as Multi-Modal Query Planners | 2024 | CIDR | 0.00014214232 |
| 1,116 | Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes | 2024 | VLDB | 0.00013890154 |
| 1,963 | DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing | 2025 | VLDB | 9.929429e-05 |
| 2,517 | Annotating Columns with Pre-trained Language Models | 2022 | SIGMOD | 8.6092139e-05 |
| 3,015 | Chorus: Foundation Models for Unified Data Discovery and Exploration | 2024 | VLDB | 7.7092391e-05 |
| 3,359 | Text2SQL is Not Enough: Unifying AI and Databases with TAG | 2025 | CIDR | 7.1744146e-05 |
Previous
Page 1 / 1
Next