Database Paper Browser

Back to papers

Navigating the Data Lake with DATAMARAN: Automatically Extracting Structure from Log Datasets

Summary: Datamaran automatically extracts structure from semi-structured log data, identifying endpoints and filtering noise. It discovers structures without boundaries, achieving 95% extraction accuracy on GitHub logs, ~66% higher than unsupervised schemes. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
5466
Venue
SIGMOD
Year
2018
Pagerank
6.8384476e-05
Overall Rank
3,690 | 74.34%
DOI
10.1145/3183713.3183746

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 8 of 8 citing papers.

Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 17 of 17 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
107 WebTables: Exploring the Power of Tables on the Web 2008 VLDB 0.00048377684
112 Potter's Wheel: An Interactive Data Cleaning System 2001 VLDB 0.00047045036
364 Annotating and Searching Web Tables Using Entities, Types and Relationships 2010 VLDB 0.00025637562
420 InfoGather: Entity Augmentation and Attribute Discovery By Holistic Matching with Web Tables 2012 SIGMOD 0.00023719065
533 RoadRunner: Towards Automatic Data Extraction from Large Web Sites 2001 VLDB 0.00020757722
587 Extracting Structured Data from Web Pages 2003 SIGMOD 0.00019648348
610 Goods: Organizing Google's Datasets 2016 SIGMOD 0.00019232674
818 Finding Related Tables 2012 SIGMOD 0.00016311524
1,001 Recovering Semantics of Tables on the Web 2011 VLDB 0.00014706505
1,267 Foofah: Transforming Data By Example 2017 SIGMOD 0.00012936483
1,317 Harvesting Relational Tables from Lists on the Web 2009 VLDB 0.00012625853
1,585 Answering Table Augmentation Queries from Unstructured Lists on the Web 2009 VLDB 0.00011255098
1,833 Data Wrangling: The Challenging Journey from the Wild to the Lake 2015 CIDR 0.00010378976
3,229 InfoGather+: Semantic Matching and Annotation of Numeric and Time-Varying Attributes in Web Tables 2013 SIGMOD 7.3393682e-05
3,281 Constance: An Intelligent Data Lake System 2016 SIGMOD 7.2823287e-05
4,440 Robust Web Extraction: An Approach Based on a Probabilistic Tree-Edit Model 2009 SIGMOD 6.187819e-05
5,399 Joint Unsupervised Structure Discovery and Information Extraction 2011 SIGMOD 5.5291067e-05
Previous Page 1 / 1 Next

Semantically Similar Papers