[关键词]
[摘要]
跨项目软件缺陷预测技术可以利用现有的已标注缺陷数据集对新的无标记项目进行预测,但需要两者之间具有相同的度量集合,难以用于实际开发.异构缺陷预测技术可以在具有异构度量集合的项目间进行缺陷预测,该技术引起了大量研究人员的关注.现有的异构缺陷预测技术利用朴素的或者传统机器学习方法为源项目和目标项目学习特征表示,所学习到的特征表示能力很弱且缺陷预测性能很差.鉴于深度神经网络强大的特征抽取和表示能力,基于变分自编码器技术提出了一种面向异构缺陷预测的特征表示方法.该模型结合了变分自编码器和最大均值差异距离,能够有效地学习源项目和目标项目的共性特征表示,基于该特征表示可以训练出有效的缺陷预测模型.在多组缺陷数据集上通过与传统跨项目缺陷预测方法及异构缺陷预测方法实验对比验证了所提方法的有效性.
[Key word]
[Abstract]
Cross-project defect prediction technology can use the existing labeled defect data to predict new unlabeled data, but it needs to have the same metric features for two projects, which is difficult to be applied in actual development. Heterogeneous defect prediction can perform prediction without requiring the source and target project to have the same set of metrics and thus has attracted great interest. Existing heterogeneous defect prediction models use naive or traditional machine learning methods to learn feature representations between source and target projects, and perform prediction based on it. The feature representation learned by previous studies is weak, causing poor performance in predicting defect-prone instances. In view of the powerful feature extraction and representation capabilities of deep neural networks, this study proposes a feature representation method for heterogeneous defect prediction based on variational autoencoders. By combining the variational autoencoder and maximum mean discrepancy, this method can effectively learn the common feature representation of the source and target projects. Then, an effective defect prediction model can be trained based on it. The validity of the proposed method is verified by comparing it with traditional cross-project defect prediction methods and heterogeneous defect prediction methods on various datasets.
[中图分类号]
[基金项目]
国家自然科学基金(61906090,U20B2064,61773208);江苏省自然科学基金(BK20191287,BK20170809);中央高校基本科研业务费专项资金(30920021131);中国博士后科学基金(2018M632304)