基于自监督图对比学习的视频问答方法
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

科技创新2030-“新一代人工智能”重大项目(2020AAA0106200);国家自然科学基金项目(62036012,U21B2044,62102415,62072286,61721004,);之江实验室开放课题(NO.2022RC0AB02);CCF-海康威视“斑头雁”基金(20210004)


Video Question Answering Method Based on Self-supervised Graph Neural Network with Contrastive Learning
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    视频问答作为一种跨模态理解任务,在给定一段视频和与之相关的问题的条件下,需要通过不同模态语义信息之间的交互来产生问题的答案.近年来,由于图神经网络在跨模态信息融合与推理方面强大的能力,其在视频问答任务中取得了显著的进展.但是,大多数现有的图网络方法由于自身固有的过拟合或过平滑、弱鲁棒性和弱泛化性的缺陷使得视频问答模型的性能未能进一步提升.鉴于预训练技术中自监督对比学习方法的有效性和鲁棒性,本文在视频问答任务中利用图数据增强的思路提出了一种图网络自监督对比学习框架GMC.该框架使用针对节点和边的两种数据增强操作来生成相异子样本,并通过提升原样本与生成子样本图数据预测分布之间的一致性来提高视频问答模型的准确率和鲁棒性.在视频问答公开数据集上通过与现有先进的视频问答模型和不同GMC变体模型的实验对比验证了所提框架的有效性.

    Abstract:

    As a cross-modal understanding task, video question answering (VideoQA) requires the interaction between semantic information of different modalities to generate answers to questions given a video and the questions associated with it. In recent years, graph neural networks have made remarkable progress in video question answering tasks due to their powerful capabilities in cross-modal information fusion and inference. However, most existing graph nerual network approaches fail to further improve the performance of VideoQA models due to their inherent deficiencies of overfitting or oversmoothing, weak robustness and weak generalization. In view of the effectiveness and robustness of self-supervised constrastive learning methods in pre-training techniques, this study proposes a self-supervised graph constrastive learning framework GMC based on the idea of graph data augmentation in the video question answering tasks. The framework uses two independent data augmentation operations for nodes and edges to generate dissimilar subsamples, and improves the consistency between the original and augmented subsample graph data prediction distributions in order to enhance the accuracy and robustness of the VideoQA models. The effectiveness of the proposed framework is verified by experimental comparisons with existing state-of-the-art VideoQA models and different GMC variants on the public dataset for video question answering tasks.

    参考文献
    相似文献
    引证文献
引用本文

姚暄,高君宇,徐常胜.基于自监督图对比学习的视频问答方法.软件学报,2023,(5):0

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2022-04-18
  • 最后修改日期:2022-05-29
  • 录用日期:
  • 在线发布日期: 2022-09-20
  • 出版日期:
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号