###
Journal of Software:2017.28(6):1455-1473

一种半监督集成跨项目软件缺陷预测方法
何吉元,孟昭鹏,陈翔,王赞,樊向宇
(天津大学 软件学院 软件工程系, 天津 300072;南通大学 计算机科学与技术学院, 江苏 南通 226019)
Semi-Supervised Ensemble Learning Approach for Cross-Project Defect Prediction
HE Ji-Yuan,MENG Zhao-Peng,CHEN Xiang,WANG Zan,FAN Xiang-Yu
(Department of Software Engineering, School of Computer Software, Tianjin University, Tianjin 300072, China;School of Computer Science and Technology, Nantong University, Nantong 226019, China)
Abstract
Chart / table
Reference
Similar Articles
Article :Browse 6336   Download 5494
Received:July 28, 2016    Revised:October 11, 2016
> 中文摘要: 软件缺陷预测方法可以在项目的开发初期,通过预先识别出所有可能含有缺陷的软件模块来优化测试资源的分配.早期的缺陷预测研究大多集中于同项目缺陷预测,但同项目缺陷预测需要充足的历史数据,而在实际应用中,可能需要预测项目的历史数据较为稀缺,或这个项目是一个全新项目.因此,跨项目缺陷预测问题成为当前软件缺陷预测领域内的一个研究热点,其研究挑战在于源项目与目标项目数据集间存在的分布差异性以及数据集内存在的类不平衡问题.受到基于搜索的软件工程思想的启发,提出了一种基于搜索的半监督集成跨项目软件缺陷预测方法S3EL.该方法首先通过调整训练集中各类数据的分布比例,构建出多个朴素贝叶斯基分类器;随后,利用具有全局搜索能力的遗传算法,基于少量已标记目标实例对上述基分类器进行集成,并构建出最终的缺陷预测模型.在Promise数据集及AEEEM数据集上与多个经典的跨项目缺陷预测方法(Burak过滤法、Peters过滤法、TCA+、CODEP及HYDRA)进行了对比.以F1值作为评测指标,结果表明:在大部分情况下,S3EL方法可以取得最好的预测性能.
Abstract:Software defect prediction can help developers to optimize the distribution of test resources by predicting whether or not a software module is defect-prone. Most defect prediction researches focus on within-project defect prediction which needs sufficient training data from the same project. However, in real software development, a project which needs defect prediction is always new or without any historical data. Therefore cross-project defect prediction becomes a hot topic which uses training data from several projects and performs prediction on another one. The main research challenges in cross-project defect prediction are the variety of distribution from source project to target project and class imbalance problem among datasets. Inspired by search based software engineering, this paper proposes a search based semi-supervised ensemble learning approach S3EL. By adjusting the ratio of distribution in training dataset,several Naïve Bayes classifiers are built as the base learners, then a small amount of labeled target instances and genetic algorithm are used to combine these base classifiers as a final prediction model. S3EL is compared with other up-to-date classical cross-project defect prediction approaches (such as Burak filter, Peters filter, TCA+, CODEP and HYDRA) on AEEEM and Promise dataset. Final results show that S3EL has the best prediction performance in most cases under the F1 measure.
文章编号:     中图分类号:    文献标志码:
基金项目:国家自然科学基金(61202030,61373012,61202006,71502125) 国家自然科学基金(61202030,61373012,61202006,71502125)
Foundation items:National Natural Science Foundation of China (61202030, 61373012, 61202006, 71502125)
Reference text:

何吉元,孟昭鹏,陈翔,王赞,樊向宇.一种半监督集成跨项目软件缺陷预测方法.软件学报,2017,28(6):1455-1473

HE Ji-Yuan,MENG Zhao-Peng,CHEN Xiang,WANG Zan,FAN Xiang-Yu.Semi-Supervised Ensemble Learning Approach for Cross-Project Defect Prediction.Journal of Software,2017,28(6):1455-1473