软件缺陷定位是指找出与软件失效相关的程序元素. 当前的缺陷定位技术仅能产生函数级或语句级的定位结果. 这种粗粒度的定位结果会影响人工调试程序和软件缺陷自动修复的效率和效果. 专注于细粒度地识别导致软件缺陷的具体代码令牌, 为代码令牌建立抽象语法树路径, 提出基于指针神经网络的细粒度缺陷定位模型来预测出具体的缺陷代码令牌和修复该令牌的具体操作行为. 开源项目中的大量缺陷补丁数据集包含大量可供训练的数据, 且基于抽象语法树构建的路径可以有效捕获程序结构信息. 实验结果表明所训练出的模型能够准确预测缺陷代码令牌并显著优于基于统计的与基于机器学习的基线方法. 另外, 为了验证细粒度的缺陷定位结果可以贡献于缺陷自动修复, 基于细粒度的缺陷定位结果设计两种程序修复流程, 即代码补全工具去预测正确令牌的方法和启发式规则寻找合适代码修复元素的方法, 结果表明两种方法都能有效解决软件缺陷自动修复中的过拟合问题.
Software defect localization refers to the activity of finding program elements that are related to software failure. The existing defect localization techniques, however, can only produce localization results at the function or statement level. These coarse-grained localization results can affect the efficiency and effectiveness of manual debugging and automatic software defect repair. This study focuses on the fine-grained identification of specific code tokens that lead to software defects. The study establishes abstract syntax tree paths for code tokens and proposes a fine-grained defect localization model based on a pointer neural network to predict specific code tokens of defects and specific operation behaviors of repairing the tokens. A large number of defect patch data sets in open-source projects contain a large amount of trainable data, and the paths constructed based on abstract syntax trees can effectively capture the program’s structural information. Experimental results show that the model trained in this study can accurately predict defect code tokens and is significantly better than the baseline methods based on statistics and machine learning. In addition, in order to verify that fine-grained defect localization results can contribute to automatic defect repair, two kinds of program repair processes are designed based on the fine-grained defect localization results. The processes are implemented by using code completion tools to predict the correct token or by following heuristic rules to find appropriate code repair elements. The results show that both methods can effectively solve the overfitting problem in automatic software defect repair.