In this paper, we propose a discriminative aggregation network (DAN) method for video face recognition, which aims to integrate information from video frames effectively and efficiently. Unlike existing aggregation methods, our method aggregates raw video frames directly instead of the features obtained by complex processing. By combining the idea of metric learning and adversarial learning, we learn an aggregation network that produces more discriminative synthesized images compared to raw input frames. Our framework reduces the number of frames to be processed and significantly speed up the recognition procedure. Furthermore, low-quality frames containing misleading information are filtered and denoised during the aggregation process, which makes our system more robust and discriminative. Experimental results show that our method can generate discriminative images from video clips and improve the overall recognition performance in both the speed and accuracy on three widely used datasets.
展开▼