Journal of Software:2017.28(4):764-785

面向国产异构众核系统的Parallel C语言设计与实现
(江南计算技术研究所, 江苏 无锡 214083)
Design and Implementation of Parallel C Programming Language for Domestic Heterogeneous Many-Core Systems
HE Wang-Quan,LIU Yong,FANG Yan-Fei,WEI Di,QI Feng-Bin
(Jiangnan Institute of Computing Technology, Wuxi 214083, China)
Chart / table
Similar Articles
Article :Browse 2621   Download 1806
Received:June 20, 2016    Revised:September 08, 2016
> 中文摘要: 异构众核架构具有超高的性能功耗比,已成为超级计算机体系结构的重要发展方向.但众核系统更为复杂的并行层次和存储层次,给编程和优化带来了极大的挑战.因此,研究面向众核系统的并行编程技术,对于降低国产众核系统并行应用的编程难度、提升并行程序的性能都具有重要的意义.提出统一架构的多模式并行编程模型,包括异构融合的加速运算模型和按同构方式编程的自主运算模型,根据编程模型设计了Parallel C语言,能够有效地描述国产众核系统的异构并行性.与其他众核系统上MPI+X的使用模式相比,编程和系统优化都具有全局视角,在多级局部性描述、单边消息、兼容已有多核应用等方面具有特色;基于Open64构建了Parallel C编译系统,全面支持加速运算模型和自主运算模型,提出并实现了数据布局与自动DMA、编译指导的线程代理和拓扑位置感知的集合通信等优化.Micro Benchmark和实际应用在神威太湖之光计算机系统上的测试数据结果表明:Parallel C语言和编译系统具有良好的性能和可扩展性,能够有效支撑大型应用.
Abstract:Heterogeneous many-core architecture, with ultra-high performance to power consumption ratio, has become an important trend of supercomputer architecture development. However, many-core systems always have more complex parallel hierarchy and memory hierarchy, hence posing a great challenge to programming and optimization. Therefore, the study of many-core-oriented parallel programming techniques is of great significance, since it can reduce the difficulty of parallel programming on domestic many-core systems and improve the performance of parallel programs. This work proposes a multi-model parallel programming model upon unified architecture, including heterogeneous-fused speedup programming model and isomorphic independent programming model. Based on this model, Parallel C programming language is designed to effectively describe heterogeneous parallelism of the domestic many-core system. Compared to MPI+X programming pattern, programming with Parallel C has a global perspective, as well as advantages in the hierarchy locality description, one-side message passing and multi-core applications compatibility. The Parallel C compiler system constructed with Open64 fully supports the heterogeneous-fused speedup programming model and isomorphic independent programming model. In addition, the design and implementation of data layout and automatic DMA optimization, compiler-directed thread proxy optimization and topology-aware collective communications optimization are presented. The performance of the proposed method is evaluated with the Miro Benchmark and practical applications on Sunway Taihu Light computer system. Experimental results show that Parallel C language and the compile system have good performance and scalability to effectively support large-scale applications.
文章编号:     中图分类号:    文献标志码:
基金项目:国家重点基础研究发展计划(973)(2016YFB0200502);国家高技术研究发展计划(863)(2012AA010903,2015AA01A301);计算机体系结构国家重点实验室基金(CARCH201403) 国家重点基础研究发展计划(973)(2016YFB0200502);国家高技术研究发展计划(863)(2012AA010903,2015AA01A301);计算机体系结构国家重点实验室基金(CARCH201403)
Foundation items:National Basic Research Program of China (973) (2016YFB0200502); National High Technology Research and Development Program of China (863)(2012AA010903, 2015AA01A301); Fund Projects of State Key Laboratory of Computer Architecture (CARCH201403)
Reference text:

何王全,刘勇,方燕飞,魏迪,漆锋滨.面向国产异构众核系统的Parallel C语言设计与实现.软件学报,2017,28(4):764-785

HE Wang-Quan,LIU Yong,FANG Yan-Fei,WEI Di,QI Feng-Bin.Design and Implementation of Parallel C Programming Language for Domestic Heterogeneous Many-Core Systems.Journal of Software,2017,28(4):764-785