###
Journal of Software:2020.31(6):1860-1874

面向大数据分析作业的启发式云资源供给方法
吴悦文,吴恒,任杰,张文博,魏峻,王焘,钟华
(中国科学院 软件研究所 软件工程技术中心, 北京 100190;中国科学院大学, 北京 100049;中国科学院 软件研究所 软件工程技术中心, 北京 100190;天基综合信息系统重点实验室(中国科学院 软件研究所), 北京 100190;中国科学院大学, 北京 100049)
Heuristic Based Resource Provisioning Approach for Big Data Analytics in Cloud Environment
WU Yue-Wen,WU Heng,REN Jie,ZHANG Wen-Bo,WEI Jun,WANG Tao,ZHONG Hua
(Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China;University of Chinese Academy of Sciences, Beijing 100049, China;Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China;Science & Technology on Integrated Information System Laboratory (Institute of Software, Chinese Academy of Sciences), Beijing 100190, China;University of Chinese Academy of Sciences, Beijing 100049, China)
Abstract
Chart / table
Reference
Similar Articles
Article :Browse 151   Download 320
Received:June 06, 2018    Revised:September 30, 2018
> 中文摘要: 云计算已成为大数据分析作业的主流运行支撑环境,选择合适的云资源优化其性能面临巨大挑战.当前研究主要考虑大数据分析框架(如Hadoop,Spark等)的多样性,采用机器学习方法进行资源供给,但样本少容易陷入局部最优解.提出了大数据环境下基于负载分类的启发式云资源供给方法RP-CH,基于云资源共享特点,获取其他大数据分析作业的运行时监测和云资源配置信息,建立负载分类与优化云资源配置的启发式规则,并将该规则作用到贝叶斯优化算法的收益函数.基于HiBench,SparkBench测试基准的结果显示:RP-CH相对于已有方法CherryPick、大数据分析作业的性能平均提升了58%,成本平均减少了44%.
Abstract:It is a big challenge to pick up the best cloud configuration for recurring big data analytics jobs running in clouds. Prior efforts may get in a sub-optimal configuration due to a broad spectrum of cloud configurations with a few test runs, such as CherryPick. RP-CH, presented in this paper, is a resource provisioning system that leverages heuristic rules based on classification information to identify the optimal cloud configuration for big data analytics jobs, while the insight is classifying a job by comparing its resource preference and usage information with other jobs. Then, heuristic rules are used to distinguish bad samples from good ones in Bayesian optimization algorithm. The experiments on HiBench and SparkBench in Aliyun ECS show that the performance of job has been improved by 58% in average comparing with CherryPick, meanwhile the resource cost has been reduced by 44% in average.
文章编号:     中图分类号:TP316    文献标志码:
基金项目:国家重点研发计划(2017YFB1400804);北京市自然科学基金(4182070);蚂蚁金服科研基金(XZ502017000730);中国科学院青年创新促进会人才专项(2018144) 国家重点研发计划(2017YFB1400804);北京市自然科学基金(4182070);蚂蚁金服科研基金(XZ502017000730);中国科学院青年创新促进会人才专项(2018144)
Foundation items:National Key Research and Development Program of China (2017YFB1400804); Beijing Natural Science Foundation (4182070); Ant Financial Research Fund (XZ502017000730); Youth Innovation Promotion Association of Chinese Academy of Sciences Fund (2018144)
Reference text:

吴悦文,吴恒,任杰,张文博,魏峻,王焘,钟华.面向大数据分析作业的启发式云资源供给方法.软件学报,2020,31(6):1860-1874

WU Yue-Wen,WU Heng,REN Jie,ZHANG Wen-Bo,WEI Jun,WANG Tao,ZHONG Hua.Heuristic Based Resource Provisioning Approach for Big Data Analytics in Cloud Environment.Journal of Software,2020,31(6):1860-1874