###
Journal of Software:2016.27(7):1861-1875

混合云中面向数据中心的工作流数据布局方法
李学俊,吴洋,刘晓,程慧敏,朱二周,杨耘
(计算智能与信号处理教育部重点实验室(安徽大学), 安徽 合肥 230039;安徽大学 计算机科学与技术学院, 安徽 合肥 230601;华东师范大学 软件学院, 上海 200062;安徽大学 计算机科学与技术学院, 安徽 合肥 230601;School of Information Technology, Swinburne University of Technology, Melbourne 3122, Australia)
Datacenter-Oriented Data Placement Strategy of Workflows in Hybrid Cloud
LI Xue-Jun,WU Yang,LIU Xiao,CHENG Hui-Min,ZHU Er-Zhou,YANG Yun
(Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education(Anhui University), Hefei 230039, China;School of Computer Science and Technology, Anhui University, Hefei 230601, China;Software Engineering Institute, East China Normal University, Shanghai 200062, China;School of Computer Science and Technology, Anhui University, Hefei 230601, China;School of Information Technology, Swinburne University of Technology, Melbourne 3122, Australia)
Abstract
Chart / table
Reference
Similar Articles
Article :Browse 2043   Download 1670
Received:September 13, 2014    Revised:June 11, 2015
> 中文摘要: 科学工作流是一种复杂的数据密集型应用程序.如何在混合云环境中对数据进行有效布局,是科学工作流所面临的重要问题,尤其是混合云的安全性要求给科学云工作流数据布局研究带来了新的挑战.传统数据布局方法大多采用基于负载均衡的划分模型布局数据集,该方法可以获得很好的负载平衡布局,然而传输时间并非最优.针对传统数据布局方法的不足,并结合混合云中数据布局的特点,首先设计一种基于数据依赖破坏度的矩阵划分模型,生成对数据依赖度破坏最小的划分;然后提出一种面向数据中心的数据布局方法,该方法依据划分模型将依赖度高的数据集尽量放在同一数据中心,从而减少数据集跨数据中心的传输时间.实验结果表明,该方法能够有效地缩短科学工作流运行时跨数据中心的数据传输时间.
Abstract:Scientific workflow is a complicated data intensive application. How to achieve an effective data placement schema in hybrid cloud environment has become a crucial issue nowadays, especially with the new challenges brought by the security issues. Traditional data placement strategies usually adopt load balancing-based partition model to allocate datasets. Although these data placement schemas can have good performance in load balancing, their data transfer time may not be optimal. In contrast to traditional strategies, this paper focuses on the hybrid cloud environment and proposes a data dependency destruction-based partition model to achieve the minimal data dependency destruction partition. In addition, it presents a novel datacenter-oriented data placement strategy. This strategy allocates high dependency datasets to one datacenter according to the new partition model and thus significantly reduces data transfer time between datacenters. Experimental results show that the proposed strategy can effectively reduce data transfer time during workflow's execution.
文章编号:     中图分类号:    文献标志码:
基金项目:国家自然科学基金(61300042,61300169) 国家自然科学基金(61300042,61300169)
Foundation items:National Natural Science Foundation of China (61300042, 61300169)
Reference text:

李学俊,吴洋,刘晓,程慧敏,朱二周,杨耘.混合云中面向数据中心的工作流数据布局方法.软件学报,2016,27(7):1861-1875

LI Xue-Jun,WU Yang,LIU Xiao,CHENG Hui-Min,ZHU Er-Zhou,YANG Yun.Datacenter-Oriented Data Placement Strategy of Workflows in Hybrid Cloud.Journal of Software,2016,27(7):1861-1875