基于依赖感知分层神经网络的代码注释增强方法
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TP391.1

基金项目:


Enhancing Code Summarization with Dependency-aware Hierarchical Neural Networks
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    作为软件工程领域的一项新兴技术,源代码自动生成注释旨在为给定的代码片段生成自然语言描述。目前最先进的代码注释技术采用编码器-解码器神经网络模型:编码器提取源代码的语义表示,而解码器则将其转换为人类可读的代码注释。然而,许多现有方法将输入的代码片段视为独立函数,往往忽略了目标函数与其调用的子函数之间的上下文依赖关系。忽视这些依赖关系可能导致关键语义信息的缺失,从而降低生成注释的质量。为此,本文提出了一种函数依赖感知的分层代码注释神经网络模型DHCS(Dependency-aware Hierarchical Code Summarization)。DHCS通过显式建模目标函数与其子函数之间的分层依赖关系,旨在提高代码注释的质量。本研究采用了一个分层编码器,包括子函数编码器和目标函数编码器,使模型能够有效地捕捉局部和上下文的语义表示。同时,本文引入了一项自监督任务,即掩码子函数预测,以增强子函数的表示学习。此外,本文提出挖掘子函数的主题分布,并将其与主题感知的复制机制相结合,集成到注释解码器中。因此,它能够直接从子函数中提取关键信息,从而更有效地生成目标函数的注释。最后,在针对Python、Java和Go语言构建的三个真实数据集上进行了大量实验,结果充分验证了本文方法的有效性。

    Abstract:

    As an emerging technique in software engineering, automatic source code summarization aims to generate natural language descriptions for given code snippets. State-of-the-art code summarization techniques utilize encoder-decoder neural models; the encoder extracts the semantic representations of the source code, while the decoder translates them into human-readable code summary. However, many existing approaches treat input code snippets as standalone functions, often overlooking the context dependencies between the target function and its invoked subfunctions. Ignoring these dependencies can result in the omission of crucial semantic information, potentially reducing the quality of the generated summary. To this end, in this paper, we introduce DHCS, a dependency-aware hierarchical code summarization neural model. DHCS is designed to improve code summarization by explicitly modeling the hierarchical dependencies between the target function and its subfunctions. Our approach employs a hierarchical encoder consisting of both a subfunction encoder and a target function encoder, allowing us to capture both local and contextual semantic representations effectively. Meanwhile, we introduce a self-supervised task, namely the masked subfunction prediction, to enhance the representation learning of subfunctions. Furthermore, we propose to mine the topic distribution of subfunctions and incorporate them into a summary decoder with a topic-aware copy mechanism. Therefore, it enables the direct extraction of key information from subfunctions, facilitating more effective summary generation for the target function. Finally, we have conducted extensive experiments on three real-world datasets constructed for Python, Java and Go languages, which clearly validate the effectiveness of our approach.

    参考文献
    相似文献
    引证文献
引用本文

张育博,姚开春,张立波,武延军,赵琛.基于依赖感知分层神经网络的代码注释增强方法.软件学报,,():0

复制
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2025-03-21
  • 最后修改日期:2025-03-21
  • 录用日期:
  • 在线发布日期: 2025-09-02
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号