基于多源域适应的缺陷类别预测方法
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TP311

基金项目:

高安全系统的软件开发与验证技术工业和信息化部重点实验室资助项目(NJ2023031),CCF-绿盟科技”鲲鹏”科研计划项目(CCF-NSFOCUS202212),云南省软件工程重点实验室开放基金(2023SE202),高可信嵌入式软件工程技术实验室开放基金(LHCESET202301)


Defect Category Prediction Based on Multi-Source Domain Adaptation
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    随着规模和复杂性的迅猛膨胀,软件系统中不可避免地存在缺陷.近年来,基于深度学习的缺陷预测技术成为软件工程领域的研究热点.该类技术可以在不运行代码的情况下发现其中潜藏的缺陷,因而在工业界和学术界受到了广泛的关注.然而,已有方法大多关注方法级的源代码中是否存在缺陷,无法精确识别具体的缺陷类别,从而降低了开发人员进行缺陷定位及修复工作的效率.此外,在实际软件开发实践中,新的项目通常缺乏足够的缺陷数据来训练高精度的深度学习模型,而利用已有项目的历史数据训练好的模型往往在新项目上无法达到良好的泛化性能.因此,本文首先将传统的二分类缺陷预测任务表述为多标签分类问题,即使用CWE(common weakness enumeration)中描述的缺陷类别作为细粒度的模型预测标签.为了提高跨项目场景下的模型性能,本文提出一种融合对抗训练和注意力机制的多源域适应框架.具体而言,该框架通过对抗训练来减少域(即软件项目)差异,并进一步利用域不变特征来获得每个源域和目标域之间的特征相关性.同时,该框架还利用加权最大均值差异作为注意力机制以最小化源域和目标域特征之间的表示距离,从而使模型可以学习到更多的域无关特征.最后在八个真实世界的开源项目上与最先进的基线方法进行大量对比实验验证了所提方法的有效性.

    Abstract:

    With the rapid expansion of scale and complexity, defects inevitably exist within software systems. In recent years, defect prediction techniques based on deep learning have become a prominent research topic in the field of software engineering. These techniques can identify potential defects without executing the code, garnering significant attention from both industry and academia. However, existing approaches mostly concentrate on determining the presence of defects at the method-level code, lacking the ability to precisely classify specific defect categories. Consequently, this undermines the efficiency of developers in locating and rectifying defects. Furthermore, in practical software development, new projects often lack sufficient defect data to train high-accuracy deep learning models. Models trained on historical data from existing projects frequently struggle to achieve satisfactory generalization performance on new projects. Hence, this paper initially reformulates the traditional binary defect prediction task into a multi-label classification problem, employing defect categories described in the Common Weakness Enumeration (CWE) as fine-grained predictive labels. To enhance the model performance in cross-project scenarios, this paper proposes a multi-source domain adaptation framework that integrates adversarial training and attention mechanisms. Specifically, the proposed framework employs adversarial training to mitigate domain (i.e., software projects) discrepancies, and further utilizes domain-invariant features to capture feature correlations between each source domain and the target domain. Simultaneously, the proposed framework employs a weighted maximum mean discrepancy as an attention mechanism to minimize the representation distance between source and target domain features, facilitating model in learning more domain-independent features. Finally, this paper conducts extensive experiments on eight real-world open-source projects to verify the effectiveness of the proposed approach by comparing it with the state-of-the-art baselines.

    参考文献
    相似文献
    引证文献
引用本文

邢颖,赵梦赐,杨斌,张俞炜,李文瑾,顾佳伟,袁军.基于多源域适应的缺陷类别预测方法.软件学报,2024,35(7):0

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2023-09-10
  • 最后修改日期:2023-10-30
  • 录用日期:
  • 在线发布日期: 2024-01-05
  • 出版日期:
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号