基于跨模态特权信息增强的图像分类方法
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TP391

基金项目:

山东省优秀青年科学基金(海外)计划(2022HWYQ-048); 济南市科技局“新高校20条”资助项目引进创新团队计划(2021GXRC073); 国家重点研发计划(2021YFC3300203)


Image Classification Method Based on Cross-modal Privileged Information Enhancement
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    图像分类算法的性能受限于视觉信息的多样性和背景噪声的影响, 现有研究通常采用跨模态约束或异构特征对齐算法学习可判别力强的视觉表征. 然而, 模态异构带来的特征分布差异等问题限制了视觉表征的有效学习. 针对该问题, 提出一种基于跨模态语义信息推理和融合的图像分类框架(CMIF), 引入图像语义描述及统计先验知识作为特权信息, 使用特权信息学习范式在模型训练阶段指导图像特征从视觉空间向语义空间映射, 提出类感知的信息选择算法(CIS)学习图像的跨模态增强表征. 针对表征学习中的异构特征差异性问题, 使用部分异构对齐算法(PHA)实现视觉特征与特权信息中提取的语义特征的跨模态对齐. 为进一步在语义空间中抑制视觉噪声带来的干扰, 提出基于图融合的CIS算法选取重构语义表征中的关键信息, 从而形成对视觉预测信息的有效补充. 在跨模态分类数据集VireoFood-172和NUS-WIDE上的实验表明, CMIF能够学习鲁棒的图像语义特征, 并且能够作为通用框架在基于卷积的ResNet-50和基于Transform架构的ViT图像分类模型上取得稳定的性能提升.

    Abstract:

    The performance of image classification algorithms is limited by the diversity of visual information and the influence of background noise. Existing works usually apply cross-modal constraints or heterogeneous feature alignment algorithms to learn visual representations with strong discrimination. However, the difference in feature distribution caused by modal heterogeneity limits the effective learning of visual representations. To address this problem, this study proposes an image classification framework (CMIF) based on cross-modal semantic information inference and fusion and introduces the semantic description of images and statistical knowledge as privileged information. The study uses the privileged information learning paradigm to guide the mapping of image features from visual space to semantic space in the training stage, and a class-aware information selection (CIS) algorithm is proposed to learn the cross-modal enhanced representation of images. In view of the heterogeneous feature differences in representation learning, the partial heterogeneous alignment (PHA) algorithm is used to achieve cross-modal alignment of visual features and semantic features extracted from privileged information. In order to further suppress the interference caused by visual noise in semantic space, the CIS algorithm based on graph fusion is selected to reconstruct the key information in the semantic representation, so as to form an effective supplement to the visual prediction information. Experiments on the cross-modal classification datasets VireoFood-172 and NUS-WIDE show that CMIF can learn robust semantic features of images, and it has achieved stable performance improvement on the convolution-based ResNet-50 and Transform-based ViT image classification models as a general framework.

    参考文献
    相似文献
    引证文献
引用本文

李象贤,郑裕泽,马浩凯,齐壮,闫晓硕,孟祥旭,孟雷.基于跨模态特权信息增强的图像分类方法.软件学报,,():1-17

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2022-12-06
  • 最后修改日期:2023-03-21
  • 录用日期:
  • 在线发布日期: 2024-01-31
  • 出版日期:
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号