In this paper we propose a novel approach for video-based person re-identification that exploits convolutional neural networks to learn the similarity of persons observed from video camera. We take 3-dimensional convolutional neural networks (3D CNN) to extract finegrained spatiotemporal features from the video sequence of a person. Unlike recurrent neural networks, 3D CNN preserves the spatial patterns of the input, which works well on re-identification problem. The network maps each video sequence of a person to a Euclidean space where distances between feature embeddings directly correspond to measures of person similarity. By our improved parameter learning method called entire triplet loss, all possible triplets in the mini-batch are taken into account to update network parameters. This parameter updating method significantly improves training, enabling the embeddings to be more discriminative. Experimental results show that our model achieves new state of the art identification rate on iLIDS-VID dataset and PRID-2011 dataset with 82.0%, 83.3% at rank 1, respectively.
展开▼