预训练驱动的多模态边界感知视觉Transformer
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家重点研发计划子项目(2018YFB0804202,2018YFB0804203);国家自然科学基金区域联合基金子项目(U19A2057);国家自然科学基金面上项目(61876070);吉林大学2021年度“学科交叉融合创新”青年学者自由探索类项目(JLUXKJC2021QZ01)


Pretraining-Driven Multimodal Boundary-Aware Vision Transformer
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    卷积神经网络(convolutional neural network,CNN)在图像篡改检测任务中不断取得性能突破,但在面向真实场景下篡改手段未知的情况时,现有方法仍然无法有效地捕获输入图像的长远依赖关系以缓解识别偏差问题,从而影响检测精度.此外,由于标注困难,图像篡改检测任务通常缺乏精准的像素级图像标注信息.针对以上问题,本文提出一种预训练驱动的多模态边界感知视觉Transformer.首先,为捕获在RGB域中不可见的细微伪造痕迹,该方法引入图像的频域模态并将其与RGB空间域结合作为多模态嵌入形式.其次利用ImageNet对主干网络的编码器进行训练以缓解当前训练样本不足的问题.然后,Transformer模块被整合到该编码器的尾部,以达到同时捕获低级空间细节信息和全局上下文的目的,从而提升模型的整体表征能力.最后,为有效地缓解因伪造区域边界模糊导致的定位难问题,本文构建边界感知模块,其可以通过Scharr卷积层获得的噪声分布以更多地关注噪声信息而不是语义内容,并利用边界残差块锐化边界信息,从而提升模型的边界分割性能.大量实验结果表明,本文提出的方法在识别精度上优于现有的图像篡改检测方法,并对不同的篡改手段具有较好的泛化性和鲁棒性.

    Abstract:

    Convolutional neural network (CNN) has continuously achieved performance breakthroughs in the task of image forgery detection, but when faced with realistic scenarios where the means of tampering is unknown, the existing methods are still unable to effectively capture the long-term dependencies of the input image to alleviate the recognition bias problem, thus affecting the detection accuracy. In addition, due to the difficulty of labeling, the task of image forgery detection usually lacks accurate pixel-level image labeling information. Aiming at the above problems, this paper proposes a pretraining-driven multimodal boundary-aware visual transformer. To capture the subtle forgery traces that are not visible in the RGB domain, the method first introduces the frequency domain modality of the image and combines it with the RGB spatial domain as a form of multimodal embedding. Secondly, the encoder of the backbone network is trained with ImageNet to alleviate the current problem of insufficient training samples. Then, the transformer module is integrated into the tail of this encoder for the purpose of capturing both low-level spatial details and global context, thereby improving the overall representation ability of the model. Finally, to effectively alleviate the problem of difficult localization caused by the blurred boundary of the forged regions, this paper establishes a boundary awareness module, which can use the noise distribution obtained by the Scharr convolutional layer to pay more attention to the noise information rather than the semantic content, and utilize the boundary residual block to sharpen the boundary information, thereby improving the boundary segmentation performance of the model. Extensive experimental results show that the proposed method outperforms existing image forgery detection methods in terms of recognition accuracy, and has better generalization and robustness to the different forgery methods.

    参考文献
    相似文献
    引证文献
引用本文

石泽男,陈海鹏,张冬,申铉京.预训练驱动的多模态边界感知视觉Transformer.软件学报,2023,34(5):0

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2022-04-15
  • 最后修改日期:2022-08-03
  • 录用日期:
  • 在线发布日期: 2022-09-20
  • 出版日期:
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号