Matching pedestrians across multiple camera views, known as humanre-identification, is a challenging research problem that has numerousapplications in visual surveillance. With the resurgence of ConvolutionalNeural Networks (CNNs), several end-to-end deep Siamese CNN architectures havebeen proposed for human re-identification with the objective of projecting theimages of similar pairs (i.e. same identity) to be closer to each other andthose of dissimilar pairs to be distant from each other. However, currentnetworks extract fixed representations for each image regardless of otherimages which are paired with it and the comparison with other images is doneonly at the final level. In this setting, the network is at risk of failing toextract finer local patterns that may be essential to distinguish positivepairs from hard negative pairs. In this paper, we propose a gating function toselectively emphasize such fine common local patterns by comparing themid-level features across pairs of images. This produces flexiblerepresentations for the same image according to the images they are pairedwith. We conduct experiments on the CUHK03, Market-1501 and VIPeR datasets anddemonstrate improved performance compared to a baseline Siamese CNNarchitecture.
展开▼