首页> 外文期刊>IEEE Transactions on Image Processing >SCAN: Self-and-Collaborative Attention Network for Video Person Re-Identification
【24h】

SCAN: Self-and-Collaborative Attention Network for Video Person Re-Identification

机译:SCAN:用于重新识别视频人的自协作注意力网络

获取原文
获取原文并翻译 | 示例
           

摘要

Video person re-identification has attracted much attention in recent years. It aims to match image sequences of pedestrians from different camera views. Previous approaches usually improve this task from three aspects, including: 1) selecting more discriminative frames; 2) generating more informative temporal representations; and 3) developing more effective distance metrics. To address the above issues, we present a novel and practical deep architecture for video person re-identification termed self-and-collaborative attention network (SCAN), which adopts the video pairs as the input and outputs their matching scores. SCAN has several appealing properties. First, SCAN adopts a non-parametric attention mechanism to refine the intra-sequence and inter-sequence feature representation of videos and outputs self-and-collaborative feature representation for each video, making the discriminative frames aligned between the probe and gallery sequences. Second, beyond the existing models, a generalized pairwise similarity measurement is proposed to generate the similarity feature representation of video pair by calculating the Hadamard product of their self-representation difference and collaborative-representation difference. Thus, the matching result can be predicted by the binary classifier. Third, a dense clip segmentation strategy is also introduced to generate rich probe-gallery pairs to optimize the model. In the test phase, the final matching score of two videos is determined by averaging the scores of top-ranked clip-pairs. Extensive experiments demonstrate the effectiveness of SCAN, which outperforms the top-1 accuracies of the best-performing baselines on iLIDS-VID, PRID2011, and MARS datasets, respectively.
机译:视频人的重新识别近年来引起了很多关注。它旨在匹配来自不同摄像机视角的行人的图像序列。先前的方法通常从三个方面改进此任务,包括:1)选择更多的判别框架; 2)生成更多信息的时间表示; 3)制定更有效的距离指标。为了解决上述问题,我们提出了一种新颖且实用的用于视频人重新识别的深度架构,称为自协作注意力网络(SCAN),该架构将视频对作为输入并输出其匹配分数。 SCAN具有几个吸引人的属性。首先,SCAN采用非参数注意机制来完善视频的序列内和序列间特征表示,并为每个视频输出自协作特征表示,从而使区分帧在探针序列与图库序列之间对齐。其次,在现有模型的基础上,提出了一种通用的成对相似度度量方法,通过计算视频对的自表示差异和协作表示差异的Hadamard乘积来生成视频对的相似特征表示。因此,可以通过二进制分类器来预测匹配结果。第三,还引入了密集的片段分割策略以生成丰富的探针库对以优化模型。在测试阶段,通过平均排名最高的剪辑对的分数来确定两个视频的最终匹配分数。大量的实验证明了SCAN的有效性,其性能优于iLIDS-VID,PRID2011和MARS数据集上表现最佳的基线的前1个准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号