首页> 外文会议>Asian Conference on Computer Vision >Video-Based Person Re-identification via 3D Convolutional Networks and Non-local Attention
【24h】

Video-Based Person Re-identification via 3D Convolutional Networks and Non-local Attention

机译:基于视频的人通过3D卷积网络和非本地关注重新识别

获取原文

摘要

Video-based person re-identification (ReID) is a challenging problem, where some video tracks of people across non-overlapping cameras are available for matching. Feature aggregation from a video track is a key step for video-based person ReID. Many existing methods tackle this problem by average/maximum temporal pooling or RNNs with attention. However, these methods cannot deal with temporal dependency and spatial misalignment problems at the same time. We are inspired by video action recognition that involves the identification of different actions from video tracks. Firstly, we use 3D convolutions on video volume, instead of using 2D convolutions across frames, to extract spatial and temporal features simultaneously. Secondly, we use a nonlocal block to tackle the misalignment problem and capture spatial-temporal long-range dependencies. As a result, the network can learn useful spatial-temporal information as a weighted sum of the features in all space and temporal positions in the input feature map. Experimental results on three datasets show that our framework outperforms state-of-the-art approaches by a large margin on multiple metrics.
机译:基于视频的人重新识别(Reid)是一个具有挑战性的问题,其中跨越非重叠摄像机的一些视频轨道可用于匹配。来自视频轨道的特征聚合是基于视频的人Reid的关键步骤。许多现有方法通过平均/最大时间汇总或引起注意力来解决这个问题。但是,这些方法不能同时处理时间依赖性和空间未对准问题。我们受到视频动作识别的启发,涉及识别视频轨道的不同动作。首先,我们在视频音量上使用3D卷积,而不是在帧上使用2D卷积,同时提取空间和时间特征。其次,我们使用非局部块来解决未对准问题并捕获空间时间远程依赖性。结果,网络可以将有用的空间信息学习为输入特征图中的所有空间和时间位置中的特征的加权之和。三个数据集的实验结果表明,我们的框架在多个度量标准上的大幅度占据了最先进的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号