基于代码属性图及注意力双向LSTM的漏洞挖掘方法
作者:
作者单位:

作者简介:

段旭(1997-),男,硕士生,主要研究领域为漏洞挖掘,智能安全.
吴敬征(1982-),男,博士,副研究员,主要研究领域为系统安全,漏洞挖掘,移动安全.
罗天悦(1990-),男,工程师,主要研究领域为操作系统安全分析,代码漏洞挖掘,人工智能安全.
杨牧天(1990-),男,工程师,主要研究领域为开源软件安全,安全漏洞挖掘检测,人工智能安全.
武延军(1979-),男,博士,研究员,博士生导师,CCF高级会员,主要研究领域为操作系统,机器学习系统软件,系统安全.

通讯作者:

吴敬征,E-mail:jingzheng08@iscas.ac.cn

中图分类号:

基金项目:

国家重点研发计划(2018YFB0803600);国家自然科学基金(61772507);北京市科委产业技术创新战略联盟促进专项(Z181100000518032)


Vulnerability Mining Method Based on Code Property Graph and Attention BiLSTM
Author:
Affiliation:

Fund Project:

National Key Research and Development Program of China (2018YFB0803600); National Natural Science Foundation of China (61772507); Special Promotion of Industrial Technology Innovation Strategic Alliance of Beijing Municipal Science and Technology Commission (Z181100000518032)

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    随着信息安全愈发严峻的趋势,软件漏洞已成为计算机安全的主要威胁之一.如何准确地挖掘程序中存在的漏洞,是信息安全领域的关键问题.然而,现有的静态漏洞挖掘方法在挖掘漏洞特征不明显的漏洞时准确率明显下降.一方面,基于规则的方法通过在目标源程序中匹配专家预先定义的漏洞模式挖掘漏洞,其预定义的漏洞模式较为刻板单一,无法覆盖到细节特征,导致其存在准确率低、误报率高等问题;另一方面,基于学习的方法无法充分地对程序源代码的特征信息进行建模,并且无法有效地捕捉关键特征信息,导致其在面对漏洞特征不明显的漏洞时,无法准确地进行挖掘.针对上述问题,提出了一种基于代码属性图及注意力双向LSTM的源码级漏洞挖掘方法.该方法首先将程序源代码转换为包含语义特征信息的代码属性图,并对其进行切片以剔除与敏感操作无关的冗余信息;其次,使用编码算法将代码属性图编码为特征张量;然后,利用大规模特征数据集训练基于双向LSTM和注意力机制的神经网络;最后,使用训练完毕的神经网络实现对目标程序中的漏洞进行挖掘.实验结果显示,在SARD缓冲区错误数据集、SARD资源管理错误数据集及它们两个C语言程序构成的子集上,该方法的F1分数分别达到了82.8%,77.4%,82.5%和78.0%,与基于规则的静态挖掘工具Flawfinder和RATS以及基于学习的程序分析模型TBCNN相比,有显著的提高.

    Abstract:

    With the increasingly serious trend of information security, software vulnerability has become one of the main threats to computer security. How to accurately mine vulnerabilities in the program is a key issue in the field of information security. However, existing static vulnerability mining methods have low accuracy when mining vulnerabilities with unobvious vulnerability features. On the one hand, rule-based methods by matching expert-defined code vulnerability patterns in target programs. Its predefined vulnerability pattern is rigid and single, which is unable to cover detailed features and result in problems of low accuracy and high false positives. On the other hand, learning-based methods cannot adequately model the features of the source code and cannot effectively capture the key feature, which makes it fail to accurately mine vulnerabilities with unobvious vulnerability features. To solve this issue, a source code level vulnerability mining method based on code property graph and attention BiLSTM is proposed. It firstly transforms the program source code to code property graph which contains semantic features, and performs program slicing to remove redundant information that is not related to sensitive operations. Then, it encodes the code property graph into the feature tensor with encoding algorithm. After that, a neural network based on BiLSTM and attention mechanism is trained using large-scale feature datasets. Finally, the trained neural network model is used to mine the vulnerabilities in the target program. Experimental results show that the F1 scores of the method reach 82.8%, 77.4%, 82.5%, and 78.0% respectively on the SARD buffer error dataset, SARD resource management error dataset, and their two subsets composed of C programs, which is significantly higher than the rule-based static mining tools Flawfinder and RATS and the learning-based program analysis model TBCNN.

    参考文献
    相似文献
    引证文献
引用本文

段旭,吴敬征,罗天悦,杨牧天,武延军.基于代码属性图及注意力双向LSTM的漏洞挖掘方法.软件学报,2020,31(11):3404-3420

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2019-07-08
  • 最后修改日期:2020-04-11
  • 录用日期:
  • 在线发布日期: 2020-11-07
  • 出版日期: 2020-11-06
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号