Navigating the Data Lake with DATAMARAN: Automatically Extracting Structure from Log Datasets
Summary: Datamaran automatically extracts structure from semi-structured log data, identifying endpoints and filtering noise. It discovers structures without boundaries, achieving 95% extraction accuracy on GitHub logs, ~66% higher than unsupervised schemes. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Yihan Gao
- 2. Silu Huang
- 3. Aditya Parameswaran
Incoming Citations (Sorted by Pagerank)
Showing 8 of 8 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 939 | Data Lake Management: Challenges and Opportunities | 2019 | VLDB | 0.00015187344 |
| 3,252 | Auto-Suggest: Learning-to-Recommend Data Preparation Steps Using Data Science Notebooks | 2020 | SIGMOD | 7.3178277e-05 |
| 3,259 | AS-Parser: Log Parsing Based on Adaptive Segmentation | 2023 | SIGMOD | 7.3147783e-05 |
| 5,275 | Auto-Tables: Synthesizing Multi-Step Transformations to Relationalize Tables without Using Examples | 2023 | VLDB | 5.5905507e-05 |
| 5,280 | Explaining Dataset Changes for Semantic Data Versioning with Explain-Da-V | 2023 | VLDB | 5.5896735e-05 |
| 8,088 | PIDS: Attribute Decomposition for Improved Compression and Query Performance in Columnar Storage | 2020 | VLDB | 4.5897316e-05 |
| 10,126 | Visual Template Inference for Data Extraction from Documents | 2026 | SIGMOD | 4.1945683e-05 |
| 11,691 | Enabling Data Science for the Majority | 2019 | VLDB | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 17 of 17 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 5,794 | Discovering Related Data At Scale | 2021 | VLDB | 5.3245122e-05 |
| 637 | Automatic segmentation of text into structured records | 2001 | SIGMOD | 0.00018824614 |
| 7,643 | Cross Modal Data Discovery over Structured and Unstructured Data Lakes | 2023 | VLDB | 4.6901105e-05 |
| 11,874 | Graph-based Exploration of Non-graph Datasets | 2016 | VLDB | 4.1945683e-05 |
| 11,732 | CoreKG: a Knowledge Lake Service | 2018 | VLDB | 4.1945683e-05 |
| 8,917 | Data Lakes Empowered by Knowledge Graph Technologies | 2021 | SIGMOD | 4.427232e-05 |
| 3,358 | Organizing Data Lakes for Navigation | 2020 | SIGMOD | 7.1784949e-05 |
| 10,797 | A Demonstration of QueryArtisan: Real-Time Data Lake Analysis via Dynamically Generated Data Manipulation Code | 2025 | VLDB | 4.1945683e-05 |
| 9,961 | QueryArtisan: Generating Data Manipulation Codes for Ad-hoc Analysis in Data Lakes | 2025 | VLDB | 4.2294678e-05 |
| 1,116 | Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes | 2024 | VLDB | 0.00013890154 |