主页期刊介绍编委会编辑部服务介绍道德声明在线审稿编委办公编辑办公English
2020-2021年专刊出版计划 微信服务介绍 最新一期:2020年第6期
     
在线出版
各期目录
纸质出版
分辑系列
论文检索
论文排行
综述文章
专刊文章
美文分享
各期封面
E-mail Alerts
RSS
旧版入口
中国科学院软件研究所
  
投稿指南 问题解答 下载区 收费标准 在线投稿
杨博,张能,李善平,夏鑫.智能代码补全研究综述.软件学报,2020,31(5):1435-1453
智能代码补全研究综述
Survey of Intelligent Code Completion
投稿时间:2019-08-19  修订日期:2019-10-28
DOI:10.13328/j.cnki.jos.005966
中文关键词:  代码补全  代码表征  软件开发工具
英文关键词:code completion  code representation  software development tool
基金项目:
作者单位E-mail
杨博 浙江大学 计算机科学与技术学院, 浙江 杭州 310007  
张能 浙江大学 计算机科学与技术学院, 浙江 杭州 310007  
李善平 浙江大学 计算机科学与技术学院, 浙江 杭州 310007  
夏鑫 Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia xin.xia@monash.edu 
摘要点击次数: 628
全文下载次数: 1695
中文摘要:
      代码补全(code completion)是自动化软件开发的重要功能之一,是大多数现代集成开发环境和源代码编辑器的重要组件.代码补全提供即时类名、方法名和关键字等预测,辅助开发人员编写程序,直观提高软件开发效率.近年来,开源软件社区中源代码和数据规模不断扩大,人工智能技术取得了卓越进展,这对自动化软件开发技术产生了极大的促进作用.智能代码补全(intelligent code completion)根据源代码建立语言模型,从语料库学习已有代码特征,根据待补全位置的上下文代码特征在语料库中检索最相似的匹配项进行推荐和预测.相对于传统代码补全,智能代码补全凭借其高准确率、多补全形式、可学习迭代的特性成为软件工程领域的热门方向之一.研究者们在智能代码补全方面进行了一系列研究,根据这些方法如何表征和利用源代码信息的不同方式,可以将它们分为基于编程语言表征和基于统计语言表征两个研究方向,其中,基于编程语言表征又分为标识符序列、抽象语法树、控制/数据流图这3个类别,基于统计语言表征又分为N-gram模型、神经网络模型这2个类别.从代码表征的角度入手,对近年来代码补全方法研究进展进行梳理和总结,主要内容包括:(1)根据代码表征方式阐述并归类了现有的智能代码补全方法;(2)总结了代码补全的一般过程和模型评估中的模型验证方法与性能评估指标;(3)归纳了智能代码补全的主要挑战;(4)展望了智能代码补全的未来发展方向.
英文摘要:
      Code completion is one of the crucial functions of automation software development. It is an essential component of most modern integrated development environments and source code editors. Code completion provides predictions such as instant class names, method names, keywords, and assists developer to code, which improves the efficiency of software development intuitively. In recent years, with the expanding of the source code and data scale in the open-source software community, and outstanding progress in artificial intelligence technology, the automation software development technology has been much promoted. Intelligent code completion builds a language model for source code, learns features from the existing code corpus, and retrieves the most similar matches in the corpus for recommendation and prediction based on the context code features around the position to be completed. Compared to traditional code completion, intelligence code completion has become one of the hot trends in the field of software engineering with its characteristics like high accuracy, multiple completion forms, and iterative learning ability. Researchers have conducted a series of researches on intelligent code completion. According to the different forms that these completion methods represent and utilize source code information, they can be divided into two research directions: programming language representation and statistical language representation. The programming language is divided into three types: token sequences, abstract syntax tree, and control/data flow graph. The statistical language also has two types: n-gram model and the neural network model. This paper starts from the perspective of code representation and summarizes the research progress of code completion methods in recent years. The main contents include: (1) expounding and classifying existing intelligent code completion methods according to code representation; (2) summarizing the experimental verification methods and performance evaluation indicators used in model evaluation; (3) summarizing the critical issues of intelligent code completion; (4) looking forward to the future development of intelligent code completion.
HTML  下载PDF全文  查看/发表评论  下载PDF阅读器
 

京公网安备 11040202500064号

主办单位:中国科学院软件研究所 中国计算机学会 京ICP备05046678号-4
编辑部电话:+86-10-62562563 E-mail: jos@iscas.ac.cn
Copyright 中国科学院软件研究所《软件学报》版权所有 All Rights Reserved
本刊全文数据库版权所有,未经许可,不得转载,本刊保留追究法律责任的权利