Database Paper Browser

Back to papers

MAD Skills: New Analysis Practices for Big Data

Summary: MAD: Magnetic, Agile, Deep data analysis marks a radical shift from EDW/BI for big data. It presents data-parallel density methods and SQL/MapReduce-enabled workflows on Greenplum to enable agile analytics for advertising networks. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
9857
Venue
VLDB
Year
2009
Pagerank
0.00038946305
Overall Rank
168 | 98.84%
DOI
-

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 50 of 53 citing papers.

Rank Citing Paper Year Venue Pagerank
42 A Comparison of Approaches to Large-Scale Data Analysis 2009 SIGMOD 0.00073498298
66 Spark SQL: Relational Data Processing in Spark 2015 SIGMOD 0.00061639801
140 The MADlib Analytics Library or MAD Skills, the SQL 2012 VLDB 0.00042270404
542 Shark: SQL and Rich Analytics at Scale 2013 SIGMOD 0.00020595648
656 ERACER: A Database Approach for Statistical Inference and Data Cleaning 2010 SIGMOD 0.00018588729
658 Towards a Unified Architecture for in-RDBMS Analytics 2012 SIGMOD 0.00018506577
761 Materialization Optimizations for Feature Selection Workloads 2014 SIGMOD 0.00017053783
794 Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing) 2010 VLDB 0.00016605103
947 MRShare: Sharing Across Multiple Queries in MapReduce 2010 VLDB 0.00015114576
1,071 Starfish: A Self-tuning System for Big Data Analytics 2011 CIDR 0.00014312777
1,265 Jaql: A Scripting Language for Large Scale Semistructured Data Analysis 2011 VLDB 0.00012947629
1,343 NoDB: Efficient Query Execution on Raw Data Files 2012 SIGMOD 0.00012482538
1,402 Hybrid Parallelization Strategies for Large-Scale Machine Learning in SystemML 2014 VLDB 0.00012180605
1,495 Ricardo: Integrating R and Hadoop 2010 SIGMOD 0.00011691049
1,532 Data Management in Machine Learning: Challenges, Techniques, and Systems 2017 SIGMOD 0.00011472681
1,876 ArrayStore: A Storage Manager for Complex Parallel Array Processing 2011 SIGMOD 0.00010239284
1,967 Compressed Linear Algebra for Large-Scale Machine Learning 2016 VLDB 9.9131712e-05
2,122 SystemDS: A Declarative Machine Learning System for the End-to-End Data Science Lifecycle 2020 CIDR 9.4989076e-05
2,337 Efficient Processing of Data Warehousing Queries in a Split Execution Environment 2011 SIGMOD 9.0098186e-05
2,667 Cumulon: Optimizing Statistical Data Analysis in the Cloud 2013 SIGMOD 8.3413995e-05
3,066 HAWQ: A Massively Parallel Processing SQL Engine in Hadoop 2014 SIGMOD 7.6221974e-05
3,081 Knowledge Expansion over Probabilistic Knowledge Bases 2014 SIGMOD 7.6031501e-05
3,601 Large-Scale Machine Learning at Twitter 2012 SIGMOD 6.9315087e-05
3,918 On Optimizing Operator Fusion Plans for Large-Scale Machine Learning in SystemML 2018 VLDB 6.6315176e-05
3,988 All-in-One: Graph Processing in RDBMSs Revisited 2017 SIGMOD 6.5589605e-05
4,033 In-RDBMS Hardware Acceleration of Advanced Analytics 2018 VLDB 6.5113267e-05
4,548 Efficient and Portable Einstein Summation in SQL 2023 SIGMOD 6.0953447e-05
4,802 Resource Elasticity for Large-Scale Machine Learning 2015 SIGMOD 5.9114415e-05
5,294 GLADE: Big Data Analytics Made Easy 2012 SIGMOD 5.5810654e-05
5,688 PREDIcT: Towards Predicting the Runtime of Large Scale Iterative Analytics 2013 VLDB 5.3702808e-05
5,903 Building Wavelet Histograms on Large Data in MapReduce 2012 VLDB 5.2791351e-05
5,964 Bridging Two Worlds with RICE: Integrating R into the SAP In-Memory Computing Engine 2011 VLDB 5.2520617e-05
5,969 MCDB-R: Risk Analysis in the Database 2010 VLDB 5.2489117e-05
6,645 Functional-Style SQL UDFs With a Capital 'F' 2020 SIGMOD 4.978205e-05
6,990 Machine Learning, Linear Algebra, and More: Is SQL All You Need? 2022 CIDR 4.8704904e-05
7,179 Coresets over Multiple Tables for Feature-rich and Data-efficient Machine Learning 2023 VLDB 4.8078895e-05
7,237 CleanM: An Optimizable Query Language for Unified Scale-Out Data Cleaning 2017 VLDB 4.7928651e-05
7,264 Online Expansion of Large-scale Data Warehouses 2011 VLDB 4.7842311e-05
7,704 ExDRa: Exploratory Data Science on Federated Raw Data 2021 SIGMOD 4.6733838e-05
8,008 Entity Resolution On-Demand 2022 VLDB 4.6067684e-05
8,399 UDA-GIST: An In-database Framework to Unify Data-Parallel and State-Parallel Analytics 2015 VLDB 4.5257744e-05
8,444 Not Black-Box Anymore! Enabling Analytics-Aware Optimizations in Teradata Vantage 2021 VLDB 4.5118994e-05
9,379 GIO: Generating Efficient Matrix and Frame Readers for Custom Data Formats by Example 2023 SIGMOD 4.3462787e-05
9,426 Storing Matrices on Disk: Theory and Practice Revisited 2011 VLDB 4.3441378e-05
9,437 BlockJoin: Efficient Matrix Partitioning Through Joins 2017 VLDB 4.3425552e-05
9,670 On Efficient Large Sparse Matrix Chain Multiplication 2024 SIGMOD 4.3066148e-05
10,482 Fast and Scalable Data Transfer Across Data Systems 2025 SIGMOD 4.1945683e-05
10,628 CatDB: Data-catalog-guided, LLM-based Generation of Data-centric ML Pipelines 2025 VLDB 4.1945683e-05
10,998 Database Native Model Selection: Harnessing Deep Neural Networks in Database Systems 2024 VLDB 4.1945683e-05
11,749 An Authorization Model for Multi-Provider Queries 2018 VLDB 4.1945683e-05
Previous Page 1 / 2 Next

Outgoing Citations (Sorted by Pagerank)

Showing 9 of 9 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Previous Page 1 / 1 Next

Semantically Similar Papers