基于虚拟属性学习的文本-图像行人检索方法
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金(61876159,62076210,62076116)


Text-based Person Search via Virtual Attribute Learning
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    文本-图像行人检索旨在从行人数据库中查找符合特定文本描述的行人图像.近年来受到学术界和工业界的广泛关注.该任务同时面临两个挑战:细粒度检索以及图像与文本之间的异构鸿沟.部分方法提出使用有监督属性学习提取属性相关特征,在细粒度上关联图像和文本.然而属性标签难以获取,导致这类方法在实践中表现不佳.如何在没有属性标注的情况下提取属性相关特征,建立细粒度的跨模态语义关联成为亟待解决的关键问题.为解决这个问题,本文融合预训练技术提出基于虚拟属性学习的文本-图像行人检索方法,通过无监督属性学习建立细粒度的跨模态语义关联.第一,基于行人属性的不变性和跨模态语义一致性提出语义引导的属性解耦方法,该方法利用行人的身份标签作为监督信号引导模型解耦属性相关特征.第二,基于属性之间的关联构建语义图提出基于语义推理的特征学习模块,该模块通过图模型在属性之间交换信息增强特征的跨模态识别能力.在公开的文本-图像行人检索数据集CUHK-PEDES和跨模态检索数据集Flickr30k上与现有方法进行实验对比,实验结果表明了所提方法的有效性.

    Abstract:

    Text-based person search aims to find the target person with a given text description, which has attracted the attention of researchers from academia and industry. It has two challenges:fine-grained retrieval and heterogeneous gap between images and texts. Some methods propose to use supervised attribute learning to obtain attribute-related features and build fine-grained and cross-modal semantic associations. But the attribute annotations are hard to obtained, making these methods difficult to be applied in practice. How to explore attribute-related features without using attribute annotations to establish fine-grained and cross modal semantic association becomes a key problem. To address this issue, we incorporate pre-training models and propose a text-based person search via virtual attribute learning approach, which associates image and text in fine-grained level through unsupervised attribute learning. First, based on the invariance and cross-modal consistency of pedestrian attributes, we propose a semantics guided attribute decoupling method. It utilizes identity labels as supervision to automatically decouple attribute-related features. Second, we propose a feature learning via semantic reasoning module, which utilizes learned attributes as nodes and the relations between attributes as edges to construct a semantic graph. We exchange information among attributes to enhance cross-modal identification ability of features. Extensive experimental results on public text-based person search dataset CUHK-PEDES and cross-modality retrieval dataset Flickr30k verify the effectiveness of the proposed approach.

    参考文献
    相似文献
    引证文献
引用本文

王成济,苏家威,罗志明,曹冬林,林耀进,李绍滋.基于虚拟属性学习的文本-图像行人检索方法.软件学报,2023,34(5):0

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2022-04-12
  • 最后修改日期:2022-05-29
  • 录用日期:
  • 在线发布日期: 2022-09-20
  • 出版日期:
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号