首页> 外文会议>International Joint Conference on Neural Networks >Dual-Triplet Metric Learning for Unsupervised Domain Adaptation in Video Face Recognition
【24h】

Dual-Triplet Metric Learning for Unsupervised Domain Adaptation in Video Face Recognition

机译:视频人脸识别中无监督域自适应的双重三重度量学习

获取原文

摘要

The scalability and complexity of deep learning models remains a key issue in many of visual recognition applications. For instance, in video surveillance, fine tuning of a model with labeled image data from each new camera is required to reduce the domain shift between videos captured from the source domain (laboratory setting) and the target domain (operational environment). In many video surveillance applications, like face recognition and person re-identification, a pair-wise matcher is typically employed to assign a query image captured using a video camera to the corresponding reference images in a gallery. The different configuration, viewpoint, and operational conditions of each camera can introduce significant shifts in pair-wise distance distributions, resulting in a decline in recognition performance for new cameras. In this paper, a new deep domain adaptation (DA) method is proposed to adapt the CNN embedding of a Siamese network using unlabeled tracklets captured with a new video camera. To this end, a dual-triplet loss is introduced for metric learning, where two triplets are constructed using video data from a source camera, and a new target camera. In order to constitute the dual triplets, a mutual-supervised learning approach is introduced where the source camera acts as a teacher, providing the target camera with an initial embedding. Then, the student relies on the teacher to iteratively label the positive and negative pairs collected during, e.g., initial camera calibration. Both source and target embeddings continue to simultaneously learn such that their pair-wise distance distributions become aligned. For validation, the proposed metric learning technique is used to train deep Siamese networks under different training scenarios, and is compared to state-of-the-art techniques for still-to-video FR on the COXS2V and a private video-based FR dataset. Results indicate that the proposed method can provide a level of accuracy that is comparable to the upper bound performance, in training scenario where labeled target data is employed to fine-tune the Siamese network.
机译:深度学习模型的可伸缩性和复杂性仍然是许多视觉识别应用程序中的关键问题。例如,在视频监视中,需要对每个新摄像机带有标签图像数据的模型进行微调,以减少从源域(实验室设置)和目标域(操作环境)捕获的视频之间的域偏移。在许多视频监视应用中,例如人脸识别和人员重新识别,通常采用成对匹配器将使用摄像机捕获的查询图像分配给图库中的相应参考图像。每个摄像机的不同配置,视点和操作条件可能会导致成对距离分布发生重大变化,从而导致新摄像机的识别性能下降。在本文中,提出了一种新的深域适应(DA)方法,该方法使用由新摄像机捕获的未标记小径来适应暹罗网络的CNN嵌入。为此,为度量学习引入了双三重态损失,其中使用来自源摄像机和新目标摄像机的视频数据构造两个三重态。为了构成双重三元组,引入了一种相互监督的学习方法,其中源摄像机充当教师,为目标摄像机提供初始嵌入。然后,学生依靠老师迭代地标记例如在初始相机校准期间收集的正对和负对。源和目标嵌入都继续同时学习,以使它们的成对距离分布变得对齐。为了进行验证,所提出的度量学习技术用于在不同的训练场景下训练深层的暹罗网络,并与COXS2V上的静态视频FR和基于视频的专用FR数据集的最新技术进行了比较。 。结果表明,在采用标记目标数据对暹罗网络进行微调的训练场景中,所提出的方法可以提供与上限性能相当的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号