软件问答社区的问题删除预测方法
作者:
作者单位:

作者简介:

通讯作者:

张莉, lily@buaa.edu.cn

中图分类号:

TP311

基金项目:

科技创新2030-“新一代人工智能”重大项目(2018AAA0102304); 国家自然科学基金(62177003); 中央高校基本科研业务费(YWF-20-BJ-J-1018)


Prediction Method for Question Deletion in Software Question and Answer Community
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    Stack Overflow是最受欢迎的软件问答社区之一, 用户可以在该网站发布问题并得到其他用户的回答. 为了保证问题质量, 网站需要尽快发现并删除低质量或者不符合社区主题的问题. 当前, Stack Overflow主要采用人工检查的方式发现需要被删除的问题. 然而这种方式往往不能保证问题被及时发现、删除, 而且加重了社区管理员的负担. 为了快速发现需要删除的问题, 提出了自动化预测问题删除的方法MulPredictor. 该方法提取问题的语义内容特征、语义统计特征和元特征, 使用随机森林分类器计算问题会被删除的概率. 实验结果表明: 与现有方法DelPredictor和NLPPredictor相比, MulPredictor的准确率在平衡测试集上分别提升了16.34%和12.78%, 在随机测试集上分别提升了12.38%和14.14%. 此外, 分析了影响问题删除的重要特征, 发现代码段、问题的标题和正文第1段的特征对问题删除有重要的影响.

    Abstract:

    Stack Overflow is one of the most popular software question and answer communities, where users can post questions and receive answers from others. In order to ensure the quality of questions, the website needs to promptly discover and delete questions with low quality or not conforming to the community’s theme. Currently, Stack Overflow mainly relies on manual inspection to find questions that need to be deleted. However, this way usually hardly guarantees to discover and delete questions in time, and increases the burden of community administrators. In order to quickly find questions that need to be deleted, this study proposes a method to automatically predict question deletion, which is named MulPredictor. This method extracts the semantic content features, the semantic statistical features and the meta features of a question, and uses the random forest classifier to calculate the probability that it will be deleted. Experimental results showed that, compared with existing methods DelPredictor and NLPPredictor, MulPredictor increases the accuracy by 16.34% and 12.78% on balanced test set, and increases the accuracy by 12.38% and 14.14% on random test set. In addition, this study also analyzes important features in question deletion, and finds that the code segment, the question’s title, and the first paragraph of the question’s body have the most significant impacts on question deletion.

    参考文献
    相似文献
    引证文献
引用本文

蒋竞,苗萌,赵丽娴,张莉.软件问答社区的问题删除预测方法.软件学报,2022,33(5):1699-1710

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2021-08-10
  • 最后修改日期:2021-10-09
  • 录用日期:
  • 在线发布日期: 2022-01-28
  • 出版日期:
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号