源码处理场景下人工智能系统鲁棒性验证方法
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金(62072227,62202219);国家重点研发计划(2019YFE0105500);江苏省重点研发计划(BE2021002-2);南京大学计算机软件新技术国家重点实验室创新项目(ZZKT2022A25);海外开放课题(KFKT2022A09)


A novel robustness verification method for source code processing based artificial intelligence systems
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    人工智能(AI,Artificial Intelligence)技术的发展为源码处理场景下AI系统提供了强有力的支撑.相较于自然语言处理,源码在语义空间上具有特殊性,源码处理相关的机器学习任务通常采用抽象语法树,数据依赖图,控制流图等方式获取代码的结构化信息并进行特征抽取.现有研究通过对源码结构的深入分析以及对分类器的灵活应用已经能够在实验场景下获得优秀的结果.然而,对于源码结构更为复杂的真实应用场景,多数源码处理相关的AI系统出现性能滑坡,难以在工业界落地,这引发了从业者对于AI系统鲁棒性的思考.由于基于AI技术开发的系统普遍是数据驱动的黑盒系统,直接衡量该类软件系统的鲁棒性存在困难.随着对抗攻击技术的兴起,在自然语言处理领域已有学者针对于不同任务设计对抗攻击来验证模型的鲁棒性并进行大规模的实证研究.为了解决源码处理场景下AI系统在复杂代码场景下的不稳定性问题,本文提出一种鲁棒性验证方法(RVMHM,Robustness Verification by Metropolis-Hastings attack Method),首先使用基于抽象语法树的代码预处理工具提取模型的变量池,然后利用MHM源码攻击算法替换变量扰动模型的预测效果.通过干扰数据和模型交互过程,观察攻击前后的鲁棒性验证指标的变化量来衡量AI系统的鲁棒性.本文以漏洞预测作为基于源码处理的二分类典型场景为例,通过在三个开源项目的数据集上验证12组AI漏洞预测模型鲁棒性说明RVMHM方法针对于源码处理场景下AI系统进行鲁棒性验证的有效性.

    Abstract:

    The development of Artificial Intelligence (AI) technology provides strong support for source code processing based AI systems. Compared with natural language processing, source code is special in semantic space. Machine learning tasks related to source code processing usually use the program’s abstract syntax trees, data dependency graph, control flow graph, and other methods to obtain the structured information of code and extract features. Existing studies can obtain excellent results in experimental scenarios through in-depth analysis of source code structure and flexible application of classifiers. However, for real application scenarios where the source code structure is more complex, most of the AI systems related to source code processing have poor performance and are difficult to implement in the industry, which triggers practitioners to consider the robustness of AI systems. Because AI-based systems are generally data-driven black box systems, it is difficult to directly measure the robustness of such software systems. With the rise of adversarial attack techniques, some scholars in the field of natural language processing have designed adversarial attacks for different tasks to verify the robustness of models and conducted large-scale empirical studies. To solve the problem of instability of AI systems based on source code processing in complex code scenarios, this paper proposes a Robustness Verification method (RVMHM, Robustness Verification by Metropolis-Hastings attack). Firstly, the code preprocessing tool based on the program’s abstract syntax trees is used to extract the variable pool of the model, and then the MHM source code attack algorithm is used to replace the variable perturbation model prediction effect. The robustness of the AI system is measured by observing the change in the robustness verification index before and after the attack by interfering with the data and model interaction process. This paper takes vulnerability prediction as a typical scenario of source code processing binary classification as an example and verifies the robustness of 12 groups of AI vulnerability prediction models on three datasets of open source projects to illustrate the effectiveness of the RVMHM method for robustness verification of source code processing based AI systems.

    参考文献
    相似文献
    引证文献
引用本文

杨焱景,毛润丰,谭睿,沈海峰,荣国平.源码处理场景下人工智能系统鲁棒性验证方法.软件学报,2023,34(9):0

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2022-09-05
  • 最后修改日期:2022-12-14
  • 录用日期:
  • 在线发布日期: 2023-01-13
  • 出版日期:
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号