###
Journal of Software:2015.26(8):2091-2110

一种优化MapReduce系统能耗的数据布局算法
宋杰,王智,李甜甜,于戈
(东北大学 软件学院, 辽宁 沈阳 110819;东北大学 信息科学与工程学院, 辽宁 沈阳 110819)
Energy Consumption Optimization Data Placement Algorithm for MapReduce System
SONG Jie,WANG Zhi,LI Tian-Tian,YU Ge
(Software College, Northeastern University, Shenyang 110819, China;School of Information Science and Engineering, Northeastern University, Shenyang 110819, China)
Abstract
Chart / table
Reference
Similar Articles
Article :Browse 2228   Download 2084
Received:June 11, 2014    Revised:December 09, 2014
> 中文摘要: 在云计算技术和大数据技术的推动下,IT资源的规模不断扩大,其能耗问题日益显著.研究表明:节点资源利用率不高、资源空闲导致的能源浪费,是目前大规模分布式系统的主要问题之一.研究了MapReduce系统的能耗优化.传统的基于软件技术的能耗优化方法多采用负载集中和节点开关算法,但由于MapReduce任务的特点,集群节点不仅要完成运算,还需要存储数据,因此,传统方法难以应用到MapReduce集群.提出了良好的数据布局可以优化集群能耗.基于此,首先定义了数据布局的能耗优化目标,并提出相应的数据布局算法;接着,从理论上证明该算法能够实现数据布局的能耗优化目标;最后,在异构集群中部署3种数据布局不同的MapReduce系统,通过对比三者在执行CPU密集型、I/O密集型和交互型这3种典型运算时的集群能耗,验证了所提出的数据布局算法的能耗优化效果.理论和实验结果均表明,所提出的布局算法能够有效地降低MapReduce集群的能耗.上述工作都将促进高能耗计算和大数据分析的应用.
中文关键词: 能耗优化  MapReduce  数据布局  大数据
Abstract:Driven by big data and cloud computing techniques, the scale of the IT expenditure grows continuously and energy consumption problem has become more and more urgent. Study shows that the lower resource usage and the long idle time of network nodes are responsible for this problem in a large-scale distributed system. This paper studies the energy consumption optimization of MapReduce system. Traditional optimization approaches employ workload concentration, task live-immigration or dynamical power on-off methods. But in a MapReduce system, a node not only executes tasks but also provides data, therefore cannot be simply shut down for energy-saving while the tasks running on it are migrated. This paper presents an idea that a good data placement can optimize the energy consumption of a MapReduce system. Based on this idea, the target of data placement which optimizes the energy consumption is defined. Then the data placement algorithm achieving the target is proved efficient in theory. Finally, three MapReduce systems with different data placement algorithms are deployed on the heterogeneous MapReduce system. Comparing the energy consumption of three systems under the three typical CPU-intensive, I/O intensive and interactive jobs, the proposed data placement algorithm is proved to be able to optimize the energy consumption of a MapReduce system. The optimization efficiency of the proposed approach is proved both in theory and by experiment, demonstrating its ability to facilitate the applications of energy consumption computing and big data analysis.
文章编号:     中图分类号:    文献标志码:
基金项目:国家自然科学基金(61202088, 61433008); 中国博士后科学基金(2013M540232); 教育部高等学校博士学科点专项科研基金(2012004211 0028); 中央高校基本科研业务费种子基金(N130417001) 国家自然科学基金(61202088, 61433008); 中国博士后科学基金(2013M540232); 教育部高等学校博士学科点专项科研基金(2012004211 0028); 中央高校基本科研业务费种子基金(N130417001)
Foundation items:
Reference text:

宋杰,王智,李甜甜,于戈.一种优化MapReduce系统能耗的数据布局算法.软件学报,2015,26(8):2091-2110

SONG Jie,WANG Zhi,LI Tian-Tian,YU Ge.Energy Consumption Optimization Data Placement Algorithm for MapReduce System.Journal of Software,2015,26(8):2091-2110