###
DOI:
Journal of Software:2014.25(S2):70-79

基于神威蓝光处理器的向量数学软件包
解庆春,张云泉,李焱,逄仁波,吴再龙,鲁永泉,高鹏东
(中国传媒大学 高性能计算中心, 北京 100024;中国科学院 软件研究所 并行软件与计算科学实验室, 北京 100190;中国传媒大学 高性能计算中心, 北京 100024;中国科学院 计算技术研究所 计算机体系结构国家重点实验室, 北京 100190;国家海洋环境预报中心 网络与计算机部, 北京 100081;中国海洋大学 信息科学与工程学院, 山东 青岛 266100)
Package of the Vector Math Library Based on the Sunway Processor
XIE Qing-Chun,ZHANG Yun-Quan,LI Yan,PANG Ren-Bo,WU Zai-Long,LU Yong-Quan,GAO Peng-Dong
(High Performance Computing Center, Communication University of China, Beijing 100024, China;Laboratory of Parallel Computing, Institute of Software, The Chinese Academy of Sciences, Beijing 100190, China;High Performance Computing Center, Communication University of China, Beijing 100024, China;State Key Laboratory of Computer Architecture, Institute of Computing Technology, The Chinese Academy of Sciences, Beijing 100190, China;Department of Computer and Network, National Marine Environmental Forecasting Center, Beijing 100081, China;School of Information Sceience and Technology, The Ocean University of China, Qingdao 266100, China)
Abstract
Chart / table
Reference
Similar Articles
Article :Browse 1126   Download 1513
Received:August 05, 2013    Revised:March 13, 2014
> 中文摘要: 首先介绍了SIMD扩展技术,并分析了使用SIMD扩展的3种方式,认为通过调用特定目标平台优化的第三方库是应用领域软件开发者快速开发高效并行程序的较好的方式;其次,介绍了国产神威处理器SW-1600平台,并利用SIMD扩展和循环展开等技术开发了SW-VML(SW Vector Math Library),开发过程中提出了访存对界、简化向量条件分支的优化方法,解决了非对界访存、向量与标量数组转换影响性能的问题,并根据SW编译器对OpenMP的支持,开发了多线程OpenMp版;最后,在SW-1600平台上采用不同向量规模对SW-VML进行了测试,测试结果显示,SIMD向量化相对于串行程序加速比为2.08,4线程相对单线程平均加速比为2.26.SW-VML是在国产神威系列处理器上开发高效程序的向量函数软件包,也是在神威蓝光高性能计算平台单计算节点开发高性能程序的基础软件工具包.
Abstract:This paper first introduces the SIMD (single instruction multiple data) extension technology and presents three ways to use SIMD instructions. It is considered that calling the third party library, which is optimized for the target platform by using those instructions, is the best way to benefit the developers. Next, it introduces the China-developed SW-1600 CPU, and a software package called SW-VML, which consists of many mathematical functions, by using the SIMD extension technology. In order to solve the additional overhead caused by unaligned address access and transformation between vector and scalar array, the paper provides some performance optimized methods, such as aligned address access, simplifying vector condition branch and loop unrolling. An upgrade to SW-VML is also offered to support multi-thread with OpenMP. Finally, functions in the package are tested using arrays of different sizes on SW-1600,and the test results show that high performance is achieved with the technology of the SIMD vectorization. Compared with the traditional methods of the scalar calculation, the average speedup is up to 2.06. The performance speedup of package using 4 threads is up to 2.26 compared to using a single thread. SW-VML is a common vector function package for domestic Sunway processor series, and it can be used as a basic toolkit which is beneficial to high performance computing on Sunway platform.
文章编号:     中图分类号:    文献标志码:
基金项目:国家自然科学基金(61133005,61272136);国家高技术研究发展计划(863)(2012AA010902,2012AA010903);中国科学院研究生科技创新与社会实践资助 国家自然科学基金(61133005,61272136);国家高技术研究发展计划(863)(2012AA010902,2012AA010903);中国科学院研究生科技创新与社会实践资助
Foundation items:
Reference text:

解庆春,张云泉,李焱,逄仁波,吴再龙,鲁永泉,高鹏东.基于神威蓝光处理器的向量数学软件包.软件学报,2014,25(S2):70-79

XIE Qing-Chun,ZHANG Yun-Quan,LI Yan,PANG Ren-Bo,WU Zai-Long,LU Yong-Quan,GAO Peng-Dong.Package of the Vector Math Library Based on the Sunway Processor.Journal of Software,2014,25(S2):70-79