面向智能计算框架的即时缺陷预测
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金项目(61772200);上海市自然科学基金项目(21ZR1416300).


Just-In-Time Defect Prediction for Intellignet Computing Frameworks
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    作为人工智能工程化的实现工具,智能计算框架已在近年来被广泛应用,其可靠性对于人工智能的有效实现至关重要.然而,智能计算框架的可靠性保障具有挑战性,一方面,智能计算框架代码迭代迅速、测试困难;另一方面,与传统软件不同,智能计算框架涉及大量张量计算,其代码规范缺乏软件工程理论指导.为了解决这一问题,现有的工作主要使用模糊测试手段实现缺陷定位,然而,这类方法只能实现特定类型缺陷的精准定位,却难以即时地在开发过程中引导开发者关注软件质量.因此,本文将国内外常见的智能计算框架(Tensorflow,百度飞浆等)作为研究对象,选取多种变更特征构建数据集,在代码提交级别对智能计算框架进行即时缺陷预测.另外,在此基础上使用LDA主题建模技术挖掘代码和代码提交信息作为新的特征,并使用随机森林进行预测.结果发现AUC-ROC平均值为0.77,且语义信息可以略微提升预测性能.最后,本文使用可解释机器学习方法SHAP分析各特征属性对模型预测输出的影响,发现(1)基本特征对于模型的影响符合传统软件开发规律;(2)代码和提交信息中的语义特征对模型的预测结果有重要影响;(3)不同系统中的不同特征对模型预测输出的贡献度排序也存在较大差异.

    Abstract:

    In recent years, Intelligent Computing frameworks have been widely applied as Artificial Intelligence (AI) engineering implementation tools, and the reliability of Intelligent Computing framework is a key factor to the effectiveness of AI implementation. However, the reliability assurance of the Intelligent Computing Framework is challenging. On the one hand, the code iteration of Intelligent Computing Framework is fast, and testing such code is difficult. On the other hand, unlike traditional software, Intelligent Computing Framework involves a large number of tensor calculations, and its code specification lacks the guidance of software engineering theory. Existing research mostly employs fuzzy testing to localize defects in order to address this issue. However, such method can only accurately discover specific types of faults, and it is difficult to guide developers and let them focus on software quality in the development process. Therefore, this paper predicts the defects of the Intelligent Computing Framework at the code commit level. We use popular Intelligent Computing frameworks (Tensorflow, Baidu PaddlePaddle, etc.) and build a variety of commit-level features to construct datasets. Furthermore, we use LDA to mine code and commit semantic information as new features, and then use Random Forest as a classifier to perform Just-In-Time defect prediction, Results show that the average AUC-ROC performance of 0.77, and introducing semantic features slightly increases the performance. Finally, we use a machine learning model explanation technique called SHAP to analyze the influence of each feature on the prediction output of the model. We discover that (1) the influence of basic features on the model conforms to the characteristics of traditional software development, (2) code and commit semantic features are important in Just-In-Time Defect Prediction of Intelligent Computing frameworks, and (3) the contribution of different features in different systems to the output of the prediction model is also quite different.

    参考文献
    相似文献
    引证文献
引用本文

葛建,虞慧群,范贵生,唐锏浩,黄子杰.面向智能计算框架的即时缺陷预测.软件学报,2023,34(9):0

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2022-09-04
  • 最后修改日期:2022-10-13
  • 录用日期:
  • 在线发布日期: 2023-01-13
  • 出版日期:
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号