Function Template Mining and Retrieval Based on Code Clone Difference Analysis
Author:
Affiliation:

Clc Number:

TP311

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    In the field of software engineering, code repositories contain a wealth of knowledge resources, which can provide developers with examples of programming practices. If repetitive patterns, frequently occurring in source code, can be effectively extracted in the form of code templates, programming efficiency could be significantly improved. In current practice, developers often reuse existing solutions by searching through source code. However, this method typically generates a large number of similar and redundant results, increasing the burden of subsequent filtering. Moreover, template mining techniques based on cloned code often fail to cover extensive patterns constructed from dispersed small clones, thereby limiting the practicality of the templates. A method is proposed for extracting and retrieving code templates based on code clone detection. This method achieves more efficient function-level code template extraction by stitching together multiple fragment-level clones and extracting and aggregating the shared parts of method-level clones and addresses the issue of template quality. Based on the mined code templates, this study comes up with a triplet representation method for code structural features that effectively supplements plain text features, and implements an efficient and concise structural representation. In addition, this study presents a template feature retrieval method that combines structural and textual search to retrieve these templates by matching features of the programming context. The tool implemented based on this method, CodeSculptor, demonstrates its significant capability to extract high-quality code templates in a test against a codebase containing 45 high-quality Java open-source projects. The results show that the templates mined by the tool achieve an average code reduction of 60.87%, with 92.09% produced by stitching fragment-level clones, a proportion of templates that is not identifiable by traditional method. It proves the superior performance of the method in recognizing and constructing code templates. Furthermore, the accuracy of the top-5 search results in the code template search and recommendation is 96.87%. A preliminary case study on 9600 randomly selected templates reveals that most of the sampled code templates are complete and coherent in semantics, thus affirming their practicality. Nonetheless, there are a few meaningless templates, highlighting the future potential to refine the proposed template extraction strategy. The user research further shows that code development tasks can be done more efficiently with CodeSculptor.

    Reference
    Related
    Cited by
Get Citation

肖泉彬,陈源,吴毅坚,彭鑫.基于代码克隆差异分析的函数模板挖掘和检索方法.软件学报,,():1-20

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:January 26,2024
  • Revised:April 07,2024
  • Adopted:
  • Online: July 03,2024
  • Published:
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063