A Survey on Automatic Parameter Tuning for Big Data Processing Systems
Journal
ACM Computing Surveys
Date Issued
June 2020
Author(s)
DOI
10.1145/3381027
Abstract
Big data processing systems (e.g., Hadoop, Spark, Storm) contain a vast number of configuration parameters controlling parallelism, I/O behavior, memory settings, and compression. Improper parameter settings can cause significant performance degradation and stability issues. However, regular users and even expert administrators grapple with understanding and tuning them to achieve good performance. We investigate existing approaches on parameter tuning for both batch and stream data processing systems and classify them into six categories: rule-based, cost modeling, simulation-based, experiment-driven, machine learning, and adaptive tuning. We summarize the pros and cons of each approach and raise some open research problems for automatic parameter tuning.
File(s)![Thumbnail Image]()
Name
3381027.pdf
Size
900.91 KB
Format
Adobe PDF
Checksum (MD5)
63b433c0929065d2ebf2d8c9a0c6572e

