基于软件知识图谱的代码语义标签自动生成方法
作者:
作者单位:

作者简介:

通讯作者:

彭鑫,E-mail:pengxin@fudan.edu.cn

基金项目:

国家自然科学基金(61972098)


Automatic Code Semantic Tag Generation based on Software Knowledge Graph
Author:
Affiliation:

Fund Project:

National Natural Science Foundation of China (61972098)

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    开源及企业软件项目和各类软件开发网站上的代码片段是重要的软件开发资源.然而,很多开发者代码搜索需求反映的代码的高层意图和主题难以通过基于代码文本的信息检索技术来实现精准的代码搜索.因此,反映代码整体意图和主题的语义标签对于改进代码搜索、辅助代码理解都具有十分重要的作用.现有的标签生成技术主要面向文本内容或依赖于历史数据,无法满足大范围代码语义标注和辅助搜索、理解的需要.针对这一问题,本文提出了一种基于知识图谱的代码语义标签自动生成方法KGCodeTagger.该方法通过基于API文档和软件开发问答文本的概念和关系抽取构造软件知识图谱作为代码语义标签生成的基础.针对给定的代码,该方法识别并抽取出通用API调用或概念提及,并链接到软件知识图谱中的相关概念上.在此基础上,该方法进一步识别与所链接的概念相关的其他概念作为候选,然后按照多样性和代表性排序产生最终的代码语义标签.本文通过实验对KGCodeTagger软件知识图谱构建的各个步骤进行了评估,并通过与几个已有的基准方法的比较对所生成的代码语义标签质量进行了评估.实验结果表明,KGCodeTagger的软件知识图谱构建步骤是合理有效的,该方法所生成的代码语义标签是高质量、有意义的,能够帮助开发人员快速理解代码的意图.

    Abstract:

    Code snippets in open-source and enterprise software projects and posted on various software development websites are important software development resources. However, developer needs for code search often reflect high-level intentions and topics, which are difficult to be satisfied through information retrieval based code search techniques. It is thus highly desirable that code snippets can be accompanied with semantic tags reflecting their high-level intentions and topics to facilitate code search and understanding. Existing tag generation technologies are mainly oriented to text content or rely on historical data, and cannot meet the needs of large-scale code semantic annotation and auxiliary code search and understanding. Targeted at the problem, this paper proposes a software knowledge graph based approach (called KGCodeTagger) that automatically generates semenatic tags for code snippets. KGCodeTagger constructs a software knowledge graph based on concepts and relations extracted from API documentations and software development Q&A text and uses the knowledge graph as the basis of code semantic tag generation. Given a code snippet, KGCodeTagger identifies and extracts API invocations and concept mentions, and then links them to the corresponding concepts in the software knowledge graph. On this basis, the approach further identifies other concepts related to the linked concepts as candidates and selects semantic tags from relevant concepts based on the diversity and representativeness. We evaluate the software knowledge graph construction steps of KGCodeTagger and the quality of the generated code tags. The results show that KGCodeTagger can produce high-quality and meaningful software knowledge graph and code semantic tags that can help developers quickly understand the intention of the code.

    参考文献
    相似文献
    引证文献
引用本文

邢双双,刘名威,彭鑫.基于软件知识图谱的代码语义标签自动生成方法.软件学报,,():0

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2020-12-25
  • 最后修改日期:2021-02-13
  • 录用日期:
  • 在线发布日期: 2021-05-21
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号