Journal of Software:2020.31(11):3404-3420

(智能软件研究中心(中国科学院 软件研究所), 北京 100190;中国科学院大学, 北京 100049;智能软件研究中心(中国科学院 软件研究所), 北京 100190;计算机科学国家重点实验室(中国科学院 软件研究所), 北京 100190)
Vulnerability Mining Method Based on Code Property Graph and Attention BiLSTM
DUAN Xu,WU Jing-Zheng,LUO Tian-Yue,YANG Mu-Tian,WU Yan-Jun
(Intelligent Software Research Center(Institute of Software, Chinese Academy of Sciences), Beijing 100190, China;University of Chinese Academy of Sciences, Beijing 100049, China;Intelligent Software Research Center(Institute of Software, Chinese Academy of Sciences), Beijing 100190, China;State Key Laboratory of Computer Science(Institute of Software, Chinese Academy of Sciences), Beijing 100190, China)
Chart / table
Similar Articles
Article :Browse 115   Download 231
Received:July 08, 2019    Revised:April 11, 2020
> 中文摘要: 随着信息安全愈发严峻的趋势,软件漏洞已成为计算机安全的主要威胁之一.如何准确地挖掘程序中存在的漏洞,是信息安全领域的关键问题.然而,现有的静态漏洞挖掘方法在挖掘漏洞特征不明显的漏洞时准确率明显下降.一方面,基于规则的方法通过在目标源程序中匹配专家预先定义的漏洞模式挖掘漏洞,其预定义的漏洞模式较为刻板单一,无法覆盖到细节特征,导致其存在准确率低、误报率高等问题;另一方面,基于学习的方法无法充分地对程序源代码的特征信息进行建模,并且无法有效地捕捉关键特征信息,导致其在面对漏洞特征不明显的漏洞时,无法准确地进行挖掘.针对上述问题,提出了一种基于代码属性图及注意力双向LSTM的源码级漏洞挖掘方法.该方法首先将程序源代码转换为包含语义特征信息的代码属性图,并对其进行切片以剔除与敏感操作无关的冗余信息;其次,使用编码算法将代码属性图编码为特征张量;然后,利用大规模特征数据集训练基于双向LSTM和注意力机制的神经网络;最后,使用训练完毕的神经网络实现对目标程序中的漏洞进行挖掘.实验结果显示,在SARD缓冲区错误数据集、SARD资源管理错误数据集及它们两个C语言程序构成的子集上,该方法的F1分数分别达到了82.8%,77.4%,82.5%和78.0%,与基于规则的静态挖掘工具Flawfinder和RATS以及基于学习的程序分析模型TBCNN相比,有显著的提高.
Abstract:With the increasingly serious trend of information security, software vulnerability has become one of the main threats to computer security. How to accurately mine vulnerabilities in the program is a key issue in the field of information security. However, existing static vulnerability mining methods have low accuracy when mining vulnerabilities with unobvious vulnerability features. On the one hand, rule-based methods by matching expert-defined code vulnerability patterns in target programs. Its predefined vulnerability pattern is rigid and single, which is unable to cover detailed features and result in problems of low accuracy and high false positives. On the other hand, learning-based methods cannot adequately model the features of the source code and cannot effectively capture the key feature, which makes it fail to accurately mine vulnerabilities with unobvious vulnerability features. To solve this issue, a source code level vulnerability mining method based on code property graph and attention BiLSTM is proposed. It firstly transforms the program source code to code property graph which contains semantic features, and performs program slicing to remove redundant information that is not related to sensitive operations. Then, it encodes the code property graph into the feature tensor with encoding algorithm. After that, a neural network based on BiLSTM and attention mechanism is trained using large-scale feature datasets. Finally, the trained neural network model is used to mine the vulnerabilities in the target program. Experimental results show that the F1 scores of the method reach 82.8%, 77.4%, 82.5%, and 78.0% respectively on the SARD buffer error dataset, SARD resource management error dataset, and their two subsets composed of C programs, which is significantly higher than the rule-based static mining tools Flawfinder and RATS and the learning-based program analysis model TBCNN.
文章编号:     中图分类号:TP311    文献标志码:
基金项目:国家重点研发计划(2018YFB0803600);国家自然科学基金(61772507);北京市科委产业技术创新战略联盟促进专项(Z181100000518032) 国家重点研发计划(2018YFB0803600);国家自然科学基金(61772507);北京市科委产业技术创新战略联盟促进专项(Z181100000518032)
Foundation items:National Key Research and Development Program of China (2018YFB0803600); National Natural Science Foundation of China (61772507); Special Promotion of Industrial Technology Innovation Strategic Alliance of Beijing Municipal Science and Technology Commission (Z181100000518032)
Reference text:


DUAN Xu,WU Jing-Zheng,LUO Tian-Yue,YANG Mu-Tian,WU Yan-Jun.Vulnerability Mining Method Based on Code Property Graph and Attention BiLSTM.Journal of Software,2020,31(11):3404-3420