Journal of Software:2019.30(4):962-980

(浙江大学 计算机科学与技术学院, 浙江 杭州 310007;Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia)
Code Clone Detection: A Literature Review
CHEN Qiu-Yuan,LI Shan-Ping,YAN Meng,XIA Xin
(College of Computer Science and Technology, Zhejiang University, Hangzhou 310007, China;Faculty of Information Technology, Monash University, Melbourne VIC 3800, Australia)
Chart / table
Similar Articles
Article :Browse 1267   Download 870
Received:August 21, 2018    Revised:October 07, 2018
> 中文摘要: 代码克隆(code clone),是指存在于代码库中两个及以上相同或者相似的源代码片段.代码克隆相关问题是软件工程领域研究的重要课题.代码克隆是软件开发中的常见现象,它能够提高效率,产生一定的正面效益.但是研究表明,代码克隆也会对软件系统的开发、维护产生负面的影响,包括降低软件稳定性,造成代码库冗余和软件缺陷传播等.代码克隆检测技术旨在寻找检测代码克隆的自动化方法,从而用较低成本减少代码克隆的负面效应.研究者们在代码克隆检测方面获得了一系列的检测技术成果,根据这些技术利用源代码信息的程度不同,可以将它们分为基于文本、词汇、语法、语义4个层次.现有的检测技术针对文本相似的克隆取得了有效的检测结果,但同时也面临着更高抽象层次克隆的挑战,亟待更先进的理论、技术来解决.着重从源代码表征方式角度入手,对近年来代码克隆检测研究进展进行了梳理和总结.主要内容包括:(1)根据源代码表征方式阐述并归类了现有的克隆检测方法;(2)总结了模型评估中使用的实验验证方法与性能评估指标;(3)从科学性、实用性和技术难点这3个方面归纳总结了代码克隆研究的关键问题,围绕数据标注、表征方法、模型构建和工程实践4个方面,阐述了问题的可能解决思路和研究的未来发展趋势.
中文关键词: 代码克隆  克隆检测  代码表征
Abstract:Code clone refers to more than two duplicate or similar code fragments existing in a software system. Code clone is a common phenomenon during software development which can facilitate development and has positive impacts on software system. However, research shows that code clone will also do harm to the development and maintenance of software system, including but not limited to the decline of stability, redundancy of source code repository, and propagation of software defects. Code clone is one of the most active research areas in software engineering. Therefore, various detection techniques are proposed to automatically detect code clone in software systems, which help improve software quality. There are a lot of achievements in this area, and these techniques can be categorized to text-based, lexis-based, syntax-based, and semantic-based categories. Current techniques have obtained effective results in text-based clone detection, but still challenges in detecting other types of code clone. More advanced and unified theoretic and technical guidelines are needed to improve code clone detection techniques. Therefore, in this paper, a literature review for code detection is presented especially from the perspective of source code representation. In summary, the contributions of this study are:(1) current code clone detection techniques are concluded and classified from the perspective of code representation; (2) the model validation and performance measures in model evaluation are concluded; and (3) the key issues of code clone research are summarized from three aspects:scientific, practical, and technical difficulties. The possible solutions to the problems and the future development of the research are elaborated, focusing on data annotation, characterization methods, model construction, and engineering practice.
文章编号:     中图分类号:    文献标志码:
Foundation items:
Reference text:


CHEN Qiu-Yuan,LI Shan-Ping,YAN Meng,XIA Xin.Code Clone Detection: A Literature Review.Journal of Software,2019,30(4):962-980