Speedup Your Analytics: Automatic Parameter Tuning for Databases and Big Data Systems

Lu, Jiaheng; Chen, Yuxing; Herodotou, Herodotos; Babu, Shivnath

doi:10.14778/3352063.3352112

Speedup Your Analytics: Automatic Parameter Tuning for Databases and Big Data Systems

Journal

Proceedings of the VLDB Endowment

Date Issued

August 2019

Author(s)

DOI

10.14778/3352063.3352112

Abstract

Database and big data analytics systems such as Hadoop and Spark have a large number of configuration parameters that control memory distribution, I/O optimization, parallelism, and compression. Improper parameter settings can cause significant performance degradation and stability issues. However, regular users and even expert administrators struggle to understand and tune them to achieve good performance. In this tutorial, we review existing approaches on automatic parameter tuning for databases, Hadoop, and Spark, which we classify into six categories: rule-based, cost modeling, simulation-based, experiment-driven, machine learning, and adaptive tuning. We describe the foundations of different automatic parameter tuning algorithms and present pros and cons of each approach. We also highlight real-world applications and systems, and identify research challenges for handling cloud services, resource heterogeneity, and real-time analytics.

Subjects

Mapreduce

Optimization

Management

Tuning algorithms

Real-time analytics

Speedup Your Analytics: Automatic Parameter Tuning for Databases and Big Data Systems

Explore by

Useful Links

Copyright Policies

Deposit your work to Ktisis