Journal of Software:2021.32(4):1023-1038

(高可信软件技术教育部重点实验室(北京大学), 北京 100871;北京大学 信息科学技术学院, 北京 100871)
Fusing Code and Documents to Mine Software Functional Features
SHEN Qi,QIAN Ying,ZOU Yan-Zhen,WU Shi-Jun,XIE Bing
(Key Laboratory of High Confidence Software Technologies of Ministry of Education (Peking University), Beijing 100871, China;School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China)
Chart / table
Similar Articles
Article :Browse 471   Download 268
Received:September 13, 2020    Revised:October 26, 2020
> 中文摘要: 在软件复用过程中,简洁、清楚的软件功能自然语言描述是帮助复用者快速了解待复用软件项目/代码库的前提和基础.但当前开源软件往往缺乏高质量的软件功能说明文档,使得这一过程变得更加复杂和困难.为此,提出了一种融合代码与文档的软件功能特征挖掘方法.该方法以动宾短语的形式描述软件功能特征,通过迭代挖掘软件源代码和以Stack Overflow讨论帖为代表的软件文档,自动提取开源软件的功能特征描述,并构造了层次化的软件功能特征视图.在针对多个开源软件项目的实验中,该方法可覆盖官方文档中列举的95.38%的软件功能.挖掘结果中语句和功能特征的准确率分别达到了93.78%和92.57%.对比现有工作TaskNav和APITasks,该方法在平均准确率上分别提升了28.78%和11.56%.
Abstract:In the process of software reuse, users need concise and clear natural language description of software functions to understand the candidate software project quickly. However, current open source software often lacks high-quality documentation, which makes this process even more complex and difficult. This study proposes a novel functional feature mining approach combining code and documentation. It describes functional features in the form of verb phrases, automatically extracts functional features by iterately mining source code and software documents such as Stack Overflow, associates corresponding API usage example for each functional feature, and builds hierarchical functional feature view for uses finally. The experiments are set on several open source software and its related heterogeneous data, the results show that the functional features generated by the proposed approach cover 95.38% of the functions in official documentation, and the proposed approach achieves 93.78% and 92.57% accuracy for mining sentences and functional features respectively. Compared to two existing tools TaskNav and APITasks, the proposed approach improves the accuracy by 28.78% and 11.56% separately.
文章编号:     中图分类号:TP311    文献标志码:
基金项目:国家自然科学基金(61972006);国家杰出青年科学基金(61525201) 国家自然科学基金(61972006);国家杰出青年科学基金(61525201)
Foundation items:National Natural Science Foundation of China (61972006); National Natural Science Fund for Distinguished Young Scholars (61525201)
Reference text:


SHEN Qi,QIAN Ying,ZOU Yan-Zhen,WU Shi-Jun,XIE Bing.Fusing Code and Documents to Mine Software Functional Features.Journal of Software,2021,32(4):1023-1038