Sentence to Model: Cost-Effective Data Collection LLM Agent
Summary: Four-stage, cost-aware data-collection pipeline—Explorer, Prioritizer, Extractor, Modeling—that outputs a dataset and a trained model from web, knowledge graphs, and internal networks. Budget-driven prioritization and LLM-based enrichment balance sources, rate limits, and latency, enabling rapid, end-to-end data-to-model delivery in a human–machine loop. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
No non-self incoming citations found for this paper in this database.
Authors
- 1. Yael Einy
- 2. Guy Dar
- 3. Slava Novgorodov
- 4. Tova Milo
Incoming Citations (Sorted by Pagerank)
Showing 0 of 0 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 3 of 3 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 1,116 | Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes | 2024 | VLDB | 0.00013890154 |
| 1,541 | Symphony: Towards Natural Language Query Answering over Multi-modal Data Lakes | 2023 | CIDR | 0.00011456579 |
| 10,881 | Datamap-Driven Tabular Coreset Selection for Classifier Training | 2025 | VLDB | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 1,963 | DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing | 2025 | VLDB | 9.929429e-05 |
| 10,973 | Unstructured Data Fusion for Schema and Data Extraction | 2024 | SIGMOD | 4.1945683e-05 |
| 10,452 | ScaleLLM: A Technique for Scalable LLM-augmented Data Systems | 2025 | SIGMOD | 4.1945683e-05 |
| 10,064 | Cut Costs, Not Accuracy: LLM-Powered Data Processing with Guarantees | 2026 | SIGMOD | 4.1945683e-05 |
| 10,595 | Optimized Batch Prompting for Cost-effective LLMs | 2025 | VLDB | 4.1945683e-05 |
| 3,840 | Revisiting Prompt Engineering via Declarative Crowdsourcing | 2024 | CIDR | 6.7106924e-05 |
| 1,116 | Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes | 2024 | VLDB | 0.00013890154 |
| 10,316 | LLM-AutoDP: Automatic Data Processing via LLM Agents for Model Fine-tuning | 2026 | VLDB | 4.1945683e-05 |
| 7,020 | LLM for Data Management | 2024 | VLDB | 4.8595728e-05 |
| 9,219 | Intelligent Agents for Data Exploration | 2024 | VLDB | 4.3702863e-05 |