###
Journal of Software:2020.31(2):266-281

一种基于领域适配的跨项目软件缺陷预测方法
陈曙,叶俊民,刘童
(华中师范大学 计算机学院, 湖北 武汉 430079)
Domain Adaptation Approach for Cross-project Software Defect Prediction
CHEN Shu,YE Jun-Min,LIU Tong
(School of Computer, Central China Normal University, Wuhan 430079, China)
Abstract
Chart / table
Reference
Similar Articles
Article :Browse 82   Download 107
Received:December 17, 2017    Revised:April 12, 2018
> 中文摘要: 软件缺陷预测旨在帮助软件开发人员在早期发现和定位软件部件可能存在的潜在缺陷,以达到优化测试资源分配和提高软件产品质量的目的.跨项目缺陷预测在已有项目的缺陷数据集上训练模型,去预测新的项目中的缺陷,但其效果往往不理想,其主要原因在于,采样自不同项目的样本数据集,其概率分布特性存在较大差异,由此对预测精度造成较大影响.针对此问题,提出一种监督型领域适配(domain adaptation)的跨项目软件缺陷预测方法.将实例加权的领域适配与机器学习的预测模型训练过程相结合,通过构造目标项目样本相关的权重,将其施加于充足的源项目样本中,以实例权重去影响预测模型的参数学习过程,将来自目标项目中缺陷数据集的分布特性适配到训练数据集中,从而实现缺陷数据样本的复用和跨项目软件缺陷预测.在10个大型开源软件项目上对该方法进行实证,从数据集、数据预处理、实验结果多个角度针对不同的实验设定策略进行分析;从数据、预测模型以及模型适配层面分析预测模型的过拟合问题.实验结果表明,该方法性能优于同类方法,显著优于基准性能,且能够接近和达到项目内缺陷预测的性能.
Abstract:Software defect prediction aims at the very early step of software quality control, helps software engineers focus their attention on defect-prone parts during verification process. Cross-project defect predictions are proposed in which prediction models are trained by using sufficient training data from already existed software projects and predict defect in some other projects, however, their performances are always poor. The main reason is that, the divergence of the data distribution among different software projects causes a dramatic impact on the prediction accuracy. This study proposed an approach of cross-project defect prediction by applying a supervised domain adaptation based on instance weighting. The sufficient instances drawn from some source project are weighted by assigning target-dependent weights to the loss function of the prediction model when minimizing the expected loss over the distribution of source data, so that the distribution properties of the data from target project can be matched to the source project. Experiments including dataset selection, data preprocessing and results are described over different experiment strategies on ten open-source software projects. Over fitting problems are also studied through different levels including dataset, prediction model and domain adaptation process. The results show that the proposed approach is close to the performance of within-project defect prediction, better than similar approach and significantly better that of the baseline.
文章编号:     中图分类号:TP311    文献标志码:
基金项目:国家科技支撑计划(2015BAK33B00) 国家科技支撑计划(2015BAK33B00)
Foundation items:National Key Technology Research and Development Program of China (2015BAK33B00)
Reference text:

陈曙,叶俊民,刘童.一种基于领域适配的跨项目软件缺陷预测方法.软件学报,2020,31(2):266-281

CHEN Shu,YE Jun-Min,LIU Tong.Domain Adaptation Approach for Cross-project Software Defect Prediction.Journal of Software,2020,31(2):266-281