Performance Optimizing Method for Sparse Convolutional Neural Networks on GPU

doi:10.13328/j.cnki.jos.006051

微信服务号

微信订阅号

Home > Archive>Volume 31, Issue 9, 2020 >2944-2964. DOI:10.13328/j.cnki.jos.006051

PDF HTML XML Export Cite reminder

Performance Optimizing Method for Sparse Convolutional Neural Networks on GPU
DOI:
                        10.13328/j.cnki.jos.006051
                    
Author:
                        
                        
                    
Affiliation:
Clc Number:
Fund Project:National Natural Science Foundation of China (61521092); National Key Research and Development Program of China (2017YFB 1003103)

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

In recent years, with dominating capability shown in plenty of tasks, deep convolutional neural networks have been deployed in applications including object detection, autonomous driving, machine translation, etc. But these models are accompanied by huge amounts of parameters and bring a heavy computational burden. The neural network pruning technique can recognize and remove parameters that contribute little to the accuracy, resulting in reduced amounts of parameters and decreased theoretical computational requirement, thus providing a chance to accelerate neural network models. However, it is hard for the pruned sparse models to achieve efficient execution on GPUs, and the performance of sparse models cannot even match their well-optimized dense counterparts. This study designs a sparsity-aware code generating method, which can generate efficient GPU code for sparse convolutions in pruned neural networks. First, a template is designed for convolution operators with several optimizations targeting GPU architecture. Through compiling and analyzing, the operator template is transformed to the intermediate representation template, which serves as the input to the designed algorithm to generate sparse convolution code according to specific sparse convolution parameters. Moreover, to improve memory throughput, optimizations are performed on data access and data placement based on the characteristics of memory access in neural networks. Finally, as the location information can be encoded into the generated code implicitly, the index structure for the sparse parameters can be eliminated, reducing the memory footprint during the execution. In experiments, it is demonstrated that the proposed sparse code generating method can improve the performance of sparse convolutional neural networks compared with current methods.

Reference

Cited by

Get Citation

董晓,刘雷,李晶,冯晓兵.面向稀疏卷积神经网络的GPU性能优化方法.软件学报,2020,31(9):2944-2964

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:October 05,2019
Revised:January 13,2020
Adopted:
Online: April 21,2020
Published: September 06,2020

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

Article Metrics

History