TurBO: A cost-efficient configuration-based auto-tuning approach for cluster-based big data frameworks

Title

TurBO: A cost-efficient configuration-based auto-tuning approach for cluster-based big data frameworks

Subject

Big data
Benchmarking
Iterative methods
Data handling
Petroleum reservoir evaluation

Description

Big data processing frameworks such as Spark usually provide a large number of performance-related configuration parameters, how to auto-tune these parameters for a better performance has been a hot issue in academia as well as industry for years. Through delicately tradeoff between exploration and exploitation, Bayesian Optimization (BO) is currently the most appealing algorithm to achieve configuration auto-tuning. However, considering the tuning cost constraint in practice, there are three critical limitations preventing conventional BO-based approaches from being directly applied into auto-tuning cluster-based big data frameworks. In this paper, we propose a cost-efficient configuration auto-tuning approach named TurBO for big data frameworks based on two enhancements of vanilla BO:1) To reduce the essential iteration times, TurBO integrates a well-designed adaptive pseudo point mechanism with BO
2) To avoid the time-consuming practical evaluation of sub-optimal configurations as possible, TurBO leverages the proposed CASampling method to intelligently tackle with these sub-optimal configurations based on ensemble learning with historical tuning experiences. To evaluate the performance of TurBO, we conducted a series of experiments on a local Spark cluster with 9 different HiBench benchmark applications. Overall, compared with 3 representative BO-based baseline approaches OpenTuner, Bliss and ResTune, TurBO is able to speedup the tuning procedures respectively by 2.24, 2.29 and 1.97 on average. Besides, TurBO can always achieve a positive cumulative performance gain under the simulated dynamic workload scenario, which means TurBO is indeed appropriate for workload changes of big data applications. 2023 Elsevier Inc.
89-105
177

Creator

Dou, Hui
Zhang, Lei
Zhang, Yiwen
Chen, Pengfei
Zheng, Zibin

Publisher

Journal of Parallel and Distributed Computing

Date

2023

Type

journalArticle

Identifier

7437315
10.1016/j.jpdc.2023.03.002

Citation

Dou, Hui et al., “TurBO: A cost-efficient configuration-based auto-tuning approach for cluster-based big data frameworks,” Lamar University Midstream Center Research, accessed May 18, 2024, https://lumc.omeka.net/items/show/29173.

Output Formats