基于图神经网络的切片级漏洞检测及解释方法
作者:
作者单位:

作者简介:

胡雨涛(1997-),女,博士生,CCF学生会员,主要研究领域为漏洞检测,代码分析,克隆检测;王溯远(2000-),男,硕士生,主要研究领域为机器学习,漏洞检测,软件工程;吴月明(1993-),男,博士,CCF学生会员,主要研究领域为移动安全,软件供应链安全,人工智能安全,恶意软件分析,漏洞分析,克隆代码审计;邹德清(1975-),男,博士,教授,博士生导师,CCF高级会员,主要研究领域为云计算安全,网络攻防与漏洞检测,软件定义安全与主动防御,大数据安全与人工智能安全,容错计算;李文科(2000-),男,硕士生,主要研究领域为软件工程,漏洞检测;金海(1966-),男,博士,教授,博士生导师,CCF会士,IEEE会士,ACM终身会员,主要研究领域为计算机系统结构,虚拟化技术,集群计算,网格计算,并行与分布式计算,对等计算普适计算,语义网,存储与安全

通讯作者:

吴月明,wuyueming21@gmail.com

中图分类号:

基金项目:

国家自然科学基金(62172168);湖北省重点研发计划(2021BAA032)


Slice-level Vulnerability Detection and Interpretation Method Based on Graph Neural Network
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    随着软件的复杂程度越来越高,对漏洞检测的研究需求也日益增大.软件漏洞的迅速发现和修补,可以将漏洞带来的损失降到最低.基于深度学习的漏洞检测方法作为目前新兴的检测手段,可以从漏洞代码中自动学习其隐含的漏洞模式,节省了大量人力投入.但基于深度学习的漏洞检测方法尚未完善,其中,函数级别的检测方法存在检测粒度较粗且检测准确率较低的问题,切片级别的检测方法虽然能够有效减少样本噪声,但仍存在以下两方面的问题:一方面,现有方法大多采用人工漏洞数据集进行实验,因此其在真实环境中的漏洞检测能力仍然存疑;另一方面,相关工作仅致力于检测出切片样本是否存在漏洞,而缺乏对检测结果可解释性的考虑.针对上述问题,提出基于图神经网络的切片级漏洞检测及解释方法.该方法首先对C/C++源代码进行规范化并提取切片,以减少样本冗余信息干扰;之后,采用图神经网络模型进行切片嵌入得到其向量表征,以保留源代码的结构信息和漏洞特征;然后,将切片的向量表征输入漏洞检测模型进行训练和预测;最后,将训练完成的漏洞检测模型和待解释的漏洞切片输入漏洞解释器,得到具体的漏洞代码行.实验结果显示:在漏洞检测方面,该方法对于真实漏洞数据的检测F1分数达到75.1%,相较于对比方法提升了41.2%-110.4%;在漏洞解释方面,该方法在限定前10%的关键节点时,准确率可达73.6%,相较于两种对比解释器分别提升了8.9%和24.9%,且时间开销分别缩短了42.5%和15.4%.最后,该方法正确检测并解释了4个开源软件中59个真实漏洞,证明了其在现实世界漏洞发掘方面的实用性.

    Abstract:

    As software becomes more complex, the need for research on vulnerability detection is increasing. The rapid discovery and patching of software vulnerabilities is able to minimize the damage caused by vulnerabilities. As an emerging detection method, deep learning-based vulnerability detection methods can learn from the vulnerability code and automatically generate its implied vulnerability pattern, saving a lot of human effort. However, deep learning-based vulnerability detection methods are not yet perfect; function-level detection methods have a coarse detection granularity with low detection accuracy; slice-level detection methods can effectively reduce sample noise, but there are still the following two aspects of the problem: On the one hand, most of the existing methods use artificial vulnerability datasets for experiments, and the ability to detect vulnerabilities in real environments is still in doubt; on the other hand, the work is only dedicated to detecting the existence of vulnerabilities in the slice samples and the lack of interpretability of the detection results. To address above issues, this study proposes a slice-level vulnerability detection and interpretation method based on the graph neural network. The method first normalizes the C/C++ source code and extracts slices to reduce the interference of redundant information in the samples; secondly, a graph neural network model is used to embed the slices to obtain their vector representations to preserve the structural information and vulnerability features of the source code; then the vector representations of slices are fed into the vulnerability detection model for training and prediction; finally, the trained vulnerability detection model and the vulnerability slices to be explained are fed into the vulnerability interpreter to obtain the specific lines of vulnerability code. The experimental results show that in terms of vulnerability detection, the method achieves an F1 score of 75.1% for real-world vulnerability, which is 41.2%-110.4% higher than the comparative methods. In terms of vulnerability interpretation, the method can reach 73.6% accuracy when limiting the top 10% of critical nodes, which is 8.9% and 24.9% higher than the other two interpreters, and the time overhead is reduced by 42.5% and 15.4%, respectively. Finally, this method correctly detects and explains 59 real vulnerabilities in the four open-source software, proving its practicality in real-world vulnerability discovery.

    参考文献
    相似文献
    引证文献
引用本文

胡雨涛,王溯远,吴月明,邹德清,李文科,金海.基于图神经网络的切片级漏洞检测及解释方法.软件学报,2023,34(6):2543-2561

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2022-09-05
  • 最后修改日期:2022-10-10
  • 录用日期:
  • 在线发布日期: 2023-01-13
  • 出版日期:
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号