视觉问答研究综述
作者:
作者单位:

作者简介:

包希港(1997-),男,博士生,主要研究领域为视觉问答,知识库问答.
肖克晶(1991-),女,博士生,主要研究领域为自然语言处理,深度学习,数据挖掘.
周春来(1976-),男,博士,副教授,CCF专业会员,主要研究领域为人工智能不确定性.
覃飙(1972-),男,博士,副教授,博士生导师,CCF专业会员,主要研究领域为人工智能,因果分析和不确定数据库.

通讯作者:

覃飙,E-mail:qinbiao@ruc.edu.cn

中图分类号:

基金项目:

国家自然科学基金(61772534,61732006)


Survey on Visual Question Answering
Author:
Affiliation:

Fund Project:

National Natural Science Foundation of China (61772534, 61732006)

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    视觉问答是计算机视觉领域和自然语言处理领域的交叉方向,近年来受到了广泛关注.在视觉问答任务中,算法需要回答基于特定图片(或视频)的问题.自2014年第一个视觉问答数据集发布以来,若干大规模数据集在近5年内被陆续发布,并有大量算法在此基础上被提出.已有的综述性研究重点针对视觉问答任务的发展进行了总结,但近年来,有研究发现,视觉问答模型强烈依赖语言偏见和数据集的分布,特别是自VQA-CP数据集发布以来,许多模型的效果大幅度下降.主要详细介绍近年来提出的算法以及发布的数据集,特别是讨论了算法在加强鲁棒性方面的研究.对视觉问答任务的算法进行分类总结,介绍了其动机、细节以及局限性.最后讨论了视觉问答任务的挑战及展望.

    Abstract:

    Visual question answering (VQA) is an interdisciplinary direction in the field of computer vision and natural language processing. It has received extensive attention in recent years. In the visual question answering, the algorithm is required to answer questions based on specific pictures (or videos). Since the first visual question answering dataset was released in 2014, several large-scale datasets have been released in the past five years, and a large number of algorithms have been proposed based on them. Existing research has focused on the development of visual question answering, but in recent years, visual question answering has been found to rely heavily on language bias and the distribution of datasets, especially since the release of the VQA-CP dataset, the accuracy of many models has been greatly reduced. This paper mainly introduces the proposed algorithms and the released datasets in recent years, especially discusses the research of algorithms on strengthening the robustness. The algorithms of visual question answering are summarized and their motivation, details, and limitations are also introduced. Finally, the challenge and prospect of visual question answering are discussed.

    参考文献
    相似文献
    引证文献
引用本文

包希港,周春来,肖克晶,覃飙.视觉问答研究综述.软件学报,2021,32(8):2522-2544

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2020-07-09
  • 最后修改日期:2020-10-02
  • 录用日期:
  • 在线发布日期: 2021-01-15
  • 出版日期: 2021-08-06
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号