基于预训练模型和多层次信息的代码坏味检测方法

doi:10.13328/j.cnki.jos.006548

微信服务号

微信订阅号

首页 > 过刊浏览>2022年第33卷第5期 >1551-1568. DOI:10.13328/j.cnki.jos.006548

PDF HTML阅读 XML下载导出引用引用提醒

基于预训练模型和多层次信息的代码坏味检测方法
DOI:
                        10.13328/j.cnki.jos.006548
                    
作者:
                        
                        
                    
作者单位:
作者简介:
通讯作者:张杨,E-mail:zhangyang@hebust.edu.cn
中图分类号:TP311
基金项目:国家自然科学基金(62172037); 河北省自然科学基金重点项目(18960106D); 河北省高等学校科学研究计划重点项目(ZD2019093)

Code Smell Detection Approach Based on Pre-training Model and Multi-level Information

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

目前已有的代码坏味检测方法仅依赖于代码结构信息和启发式规则, 对嵌入在不同层次代码中的语义信息关注不够, 而且现有的代码坏味检测方法准确率还有进一步提升的空间. 针对该问题, 提出一种基于预训练模型和多层次信息的代码坏味检测方法DeepSmell, 首先采用静态分析工具提取程序中的代码坏味实例和多层次代码度量信息, 并对代码坏味实例进行标记; 然后通过抽象语法树解析并获取源代码中与代码坏味相关的层次信息, 将其中的文本信息与度量信息相结合生成数据样本; 最后使用BERT预训练模型将文本信息转化为词向量, 应用GRU-LSTM模型获取层次信息之间潜在的语义关系, 并结合CNN模型与注意力机制检测代码坏味. 在实验中, 选取JUnit、Xalan和SPECjbb2005等24个大型实际应用程序构建训练集和测试集, 并对特征依恋、长方法、数据类和上帝类等4种代码坏味进行检测. 实验结果表明, DeepSmell与目前已有的检测方法相比在平均查全率和F1值上分别提高了9.3%和10.44%, 同时保持了较高的查准率, DeepSmell可以有效地实现代码坏味检测.

Abstract:

Most of the existing code smell detection approaches rely on code structure information and heuristic rules, while pay little attention to the semantic information embedded in different levels of code, and the accuracy of code smell detection approaches is not high. To solve this problem, this study proposes a novel approach DeepSmell based on a pre-trained model and multi-level metrics. Firstly, the static analysis tool is used to extract code smell instances and multi-level code metric information in the source program and mark these instances. Secondly, the level information that relate to code smells in the source code are parsed and obtained through the abstract syntax tree. The textual information composed of the level information is combined with code metric information to generate the data set. Finally, text information is converted into word vectors using the BERT pre-training model. The GRU-LSTM model is applied to obtain the potential semantic relationship among the identifiers, and the CNN model is combined with attention mechanism to code smell detection. The experiment tested four kinds of code smells including feature envy, long method, data class, and god class on 24 open source programs such as JUnit, Xalan, and SPECjbb2005. The results show that DeepSmell improves the average recall and F1 by 9.3% and 10.44% respectively compared with existing detection methods, and maintains a high level of precision at the same time.

参考文献

相似文献

引证文献

引用本文

张杨,东春浩,刘辉,葛楚妍.基于预训练模型和多层次信息的代码坏味检测方法.软件学报,2022,33(5):1551-1568

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2021-08-06
最后修改日期:2021-10-09
录用日期:
在线发布日期: 2022-01-28
出版日期: 2022-05-06

微信服务号

微信订阅号

引用本文

分享

文章指标

历史