首页> 外文期刊>Journal of electronic imaging >IMG-Net: inner-cross-modal attentional multigranular network for description-based person re-identification
【24h】

IMG-Net: inner-cross-modal attentional multigranular network for description-based person re-identification

机译:IMG-NET:基于描述的人的内部交叉模态注意力多字体网络重新识别

获取原文
获取原文并翻译 | 示例
       

摘要

Given a natural language description, description-based person re-identification aims to retrieve images of the matched person from a large-scale visual database. Due to the existing modality heterogeneity, it is challenging to measure the cross-modal similarity between images and text descriptions. Many of the existing approaches usually utilize a deep-learning model to encode local and global fine-grained features with a strict uniform partition strategy. This breaks the part coherence, making it difficult to capture meaningful information from the within-part and semantic information among body parts. To address this issue, we proposed an inner-cross-modal attentional multigranular network (IMG-Net) to incorporate inner-modal self-attention and cross-modal hard-region attention with the fine-grained model for extracting the multigranular semantic information. Specifically, the inner-modal self-attention module is proposed to address the within-part consistency broken problem using both spatial-wise and channel-wise information. Following it is a multigranular feature extraction module, which is used to extract rich local and global visual and textual features with the help of group normalization (GN). Then a cross-modal hard-region attention module is proposed to obtain the local visual representation and phrase representation. Furthermore, a GN is used instead of batch normalization for the accurate batch statistics estimation. Comprehensive experiments with ablation analysis demonstrate that IMG-Net achieves the state-of-the-art performance on the CUHK-PEDES dataset and outperforms other previous methods significantly. (C) 2020 SPIE and IS&T
机译:给定自然语言描述,基于描述的人重新识别旨在从大规模的可视数据库中检索匹配人的图像。由于现有的模态异质性,测量图像和文本描述之间的跨模型相似性有挑战性。许多现有方法通常使用深度学习模型来编码具有严格统一的分区策略的本地和全球细粒度特征。这会破坏零件一致性,使得难以从身体部位之间的部分内部和语义信息捕获有意义的信息。为了解决这个问题,我们提出了内部跨型拟注意的多字体网络(IMG-Net),将内部模态的自我关注和跨模型硬区域注意力与细粒化模型一起提取,用于提取多字形语义信息。具体地,建议使用空间和频道明智的信息来解决局部常量破坏问题的内部模态自我注意模块。以下是一种多字体特征提取模块,用于借助组标准化(GN)提取丰富的本地和全局视觉和文本特征。然后提出跨模式硬区域注意模块以获得本地视觉表示和短语表示。此外,使用GN代替精确批量统计估计的批量归一化。通过消融分析的综合实验表明,IMG-Net在Cuhk-Pedes数据集上实现了最先进的性能,并显着优于其他先前的方法。 (c)2020个SPIE和IS&T

著录项

  • 来源
    《Journal of electronic imaging》 |2020年第4期|043028.1-043028.18|共18页
  • 作者单位

    Nanjing Tech Univ Sch Comp Sci & Technol Nanjing Peoples R China;

    Nanjing Tech Univ Sch Comp Sci & Technol Nanjing Peoples R China|China Univ Min & Technol Sch Informat & Control Engn Xuzhou Jiangsu Peoples R China;

    Nanjing Tech Univ Sch Comp Sci & Technol Nanjing Peoples R China;

    Nanjing Tech Univ Sch Comp Sci & Technol Nanjing Peoples R China;

    Nanjing Tech Univ Sch Comp Sci & Technol Nanjing Peoples R China;

    Nanjing Tech Univ Sch Comp Sci & Technol Nanjing Peoples R China;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    person re-identification; natural language description; multigranular matching;

    机译:人重新识别;自然语言描述;多个人匹配;
  • 入库时间 2022-08-19 01:58:49

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号