To learn a neural network for extracting a feature of an image high in robustness relative to an image region having no identification force, while minimizing the number of parameters of a pooling layer.SOLUTION: A loss function is represented using the distance between a first feature vector of a first image and a second feature vector of a second image, as fitting images obtained by applying a convolution neural network including a full convolution layer which outputs a feature tensor of an input image by applying convolution to the input image, a weight matrix estimation layer which estimates a weight matrix indicating the weight of each element of the feature tensor, and a pooling layer which extracts a feature vector of the input image based on the feature tensor and the weight matrix. A parameter learning unit 130 learns the parameter of each layer of the convolution neural network, so that a loss function value obtained by calculating the loss function becomes small.SELECTED DRAWING: Figure 2
展开▼