###
Journal of Software:2017.28(7):1655-1675

ParaC:面向GPU平台的图像处理领域的编程框架
卢兴敬,刘雷,贾海鹏,冯晓兵,武成岗
(体系结构国家重点实验室 中国科学院 计算技术研究所, 北京 100190;中国科学院大学, 北京 100049)
ParaC: A Domain Programming Framework of Image Processing on GPU Accelerators
LU Xing-Jing,LIU Lei,JIA Hai-Peng,FENG Xiao-Bing,WU Cheng-Gang
(State Key Laboratory of Computer Architecture Institute of Computing Technology, The Chinese Academy of Sciences, Beijing 100190, China;University of Chinese Academy of Sciences, Beijing 100049, China)
Abstract
Chart / table
Reference
Similar Articles
Article :Browse 1699   Download 1294
Received:September 05, 2016    Revised:October 14, 2016
> 中文摘要: GPGPU加速器是当前提高图像处理算法性能的主流加速平台,但在GPGPU平台上,同一个程序充分利用硬件体系结构特征和软件特征的优化版本与简单实现版本在性能上会有数量级的差异.GPGPU加速器具有多维多层的大量执行线程和层次化存储体系结构,后者的不同层次具有不同的容量、带宽、延迟和访问权限.同时,图像处理应用程序具有复杂的计算操作、边界处理规则和数据访问特性.因此,任务的并发执行模式、线程的组织方式和并发任务到设备的映射不仅影响到程序的并发度、调度、通信和同步等特性,而且也会影响到访存的带宽、延迟等.因此,GPGPU平台上的程序优化是一个困难、复杂且效率较低的过程.提出基于语言扩展的领域编程模型:ParaC.ParaC编程环境利用高层语言扩展描述的程序语义信息,自动分析获取应用程序的操作信息、并发任务间的数据重用信息和访存信息等程序特征,同时结合硬件平台特征,利用基于领域先验知识驱动的编译优化模型自动生成GPGPU平台上的优化代码,最后,利用源源变换编译器生成标准OpenCL程序.在测试用例上的实验结果表明,ParaC在GPGPU平台上自动生成的优化版本相对于手工优化版本的加速比最高达到3.22倍,但代码行数只是后者的1.2%~39.68%.
Abstract:Image processing algorithms take the GPU accelerators as the main speedup solution. However, the performance difference between a naïve implementation and a highly optimized one on the same GPU accelerators is frequently an order of magnitude or more. The GPGPU platform features complicated hardware architecture characteristics, such as the large amount of multi-dimension and multi -level threads and the deep hierarchy memory system, while the different part of the latter features different capacity, bandwidth, latency and access authority. Additionally, image processing algorithms have complex operations, border data accessing rules and memory accessing patterns. Therefore, parallel execution model of tasks, organization of threads and parallel tasks to device mapping not only have big impact on the scalability, scheduling, communication and synchronization, but also affect the efficiency of memory accessing. In a word, the algorithm optimization methods on GPGPU platforms are difficult, complicated and less efficient. This paper proposes a domain specific language, ParaC, which can provide high level program semantics through the new language extensions. It obtains the applications' software characteristics, such as the operation information, the data reuse among parallel tasks and the memory access patterns, along with hardware platform information and the domain pre-knowledge driven optimization mechanism, to generate high performance GPGPU code automatically. The source-to-source compiler is then used to output the standard OpenCL programs. Experiment results on test cases show that ParaC automatically generated optimization version has gained 3.22 speedup compared to the hand-tuned version for the best case, while the number of lines of the former is just 1.2% to 39.68% of the latter.
文章编号:     中图分类号:    文献标志码:
基金项目:国家自然科学基金(61432018,61402445,61502452,61602443,61432018);国家重点研发计划(2016YFB1000402);数学工程与先进计算国家重点实验室开放基金(2016A03);北京市科委计划(D161100001216002) 国家自然科学基金(61432018,61402445,61502452,61602443,61432018);国家重点研发计划(2016YFB1000402);数学工程与先进计算国家重点实验室开放基金(2016A03);北京市科委计划(D161100001216002)
Foundation items:National Natural Science Foundation of China (61432018, 61402445, 61502452, 61602443, 61432018); National Key R&D Program of China (2016YFB1000402); State Key Laboratory of Mathematical Engineering and Advanced Computing Open Foundation (2016A03); Beijing Municipal Science & Technology Commission Program (D161100001216002)
Reference text:

卢兴敬,刘雷,贾海鹏,冯晓兵,武成岗.ParaC:面向GPU平台的图像处理领域的编程框架.软件学报,2017,28(7):1655-1675

LU Xing-Jing,LIU Lei,JIA Hai-Peng,FENG Xiao-Bing,WU Cheng-Gang.ParaC: A Domain Programming Framework of Image Processing on GPU Accelerators.Journal of Software,2017,28(7):1655-1675