Database Paper Browser

Back to papers

Enabling Efficient and General Subpopulation Analytics in Multidimensional Data Streams

Summary: Hydra enables real-time, general subpopulation analytics on multidimensional streams with a 'sketch of sketches' and universal sketching to bound errors across combinatorial subpopulations. Spark plugin implementation minimizes overhead and memory, delivering interactive estimates with order-of-magnitude gains versus Spark/Druid. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
12805
Venue
VLDB
Year
2022
Pagerank
4.7180004e-05
Overall Rank
7,534 | 47.59%
DOI
10.14778/3551793.3551867

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 3 of 3 citing papers.

Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 40 of 40 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
3 Pig Latin: A Not-So-Foreign Language for Data Processing 2008 SIGMOD 0.0024183614
11 Implementing Data Cubes Efficiently 1996 SIGMOD 0.0011708144
14 Online Aggregation 1997 SIGMOD 0.0010801504
22 SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets 2008 VLDB 0.0008456613
66 Spark SQL: Relational Data Processing in Spark 2015 SIGMOD 0.00061639801
70 Hive - A Warehousing Solution Over a Map-Reduce Framework 2009 VLDB 0.00059533166
109 Dremel: Interactive Analysis of Web-Scale Datasets 2010 VLDB 0.00048186983
126 Space-Efficient Online Computation of Quantile Summaries 2001 SIGMOD 0.00044744986
191 The Design of the Borealis Stream Processing Engine 2005 CIDR 0.00035738595
210 Gorilla: A Fast, Scalable, In-Memory Time Series Database 2015 VLDB 0.0003404384
273 Approximate Computation of Multidimensional Aggregates of Sparse Data Using Wavelets 1999 SIGMOD 0.00029390945
314 MillWheel: Fault-Tolerant Stream Processing at Internet Scale 2013 VLDB 0.00028084774
323 Gigascope: A Stream Database for Network Applications 2003 SIGMOD 0.00027492196
402 Mergeable Summaries 2012 PODS 0.00024196343
429 The Aqua Approximate Query Answering System 1999 SIGMOD 0.00023476494
460 SeeDB: Efficient Data-Driven Visualization Recommendations to Support Visual Analytics 2015 VLDB 0.00022516069
476 Impala: A Modern, Open-Source SQL Engine for Hadoop 2015 CIDR 0.00022226941
739 Congressional Samples for Approximate Answering of Group-By Queries 2000 SIGMOD 0.00017401518
943 Wander Join: Online Aggregation via Random Walks 2016 SIGMOD 0.00015145883
1,204 VerdictDB: Universalizing Approximate Query Processing 2018 SIGMOD 0.00013319541
1,464 Online Aggregation for Large MapReduce Jobs 2011 VLDB 0.00011865546
1,487 Scuba: Diving into Data at Facebook 2013 VLDB 0.00011701099
1,574 Approximate Query Processing: No Silver Bullet 2017 SIGMOD 0.00011287495
1,588 Druid: A Real-time Analytical Data Store 2014 SIGMOD 0.00011239313
1,909 SciBORQ: Scientific data management with Bounds On Runtime and Quality 2011 CIDR 0.00010121304
1,955 Efficient Computation of Iceberg Cubes with Complex Measures 2001 SIGMOD 9.9629452e-05
1,990 Fault-Tolerance in the Borealis Distributed Stream Processing System 2005 SIGMOD 9.8472819e-05
2,126 MacroBase: Prioritizing Attention in Fast Data 2017 SIGMOD 9.4887794e-05
2,501 DBEst: Revisiting Approximate Query Processing Engines with Machine Learning Models 2019 SIGMOD 8.6453446e-05
2,953 Moment-Based Quantile Sketches for Efficient High Cardinality Aggregation Queries 2018 VLDB 7.8267643e-05
3,157 High-Dimensional OLAP: A Minimal Cubing Approach 2004 VLDB 7.4656511e-05
3,388 Analytics in Motion: High Performance Event-Processing AND Real-Time Analytics in the Same Database 2015 SIGMOD 7.1571148e-05
3,590 Quotient Cube: How to Summarize the Semantics of a Data Cube 2002 VLDB 6.9421381e-05
3,614 Persistent Data Sketching 2015 SIGMOD 6.9147318e-05
4,029 Spatial Online Sampling and Aggregation 2016 VLDB 6.51315e-05
4,052 Interactive Analysis of Web-Scale Data 2009 CIDR 6.4936745e-05
5,903 Building Wavelet Histograms on Large Data in MapReduce 2012 VLDB 5.2791351e-05
6,244 Approximate Distinct Counts for Billions of Datasets 2019 SIGMOD 5.139669e-05
6,646 Geospatial Stream Query Processing using Microsoft SQL Server StreamInsight 2010 VLDB 4.9772435e-05
8,673 CoopStore: Optimizing Precomputed Summaries for Aggregation 2020 VLDB 4.4709116e-05
Previous Page 1 / 1 Next

Semantically Similar Papers