首页> 外文会议>IEEE Winter Conference on Applications of Computer Vision >Unsupervised Adversarial Visual Level Domain Adaptation for Learning Video Object Detectors From Images
【24h】

Unsupervised Adversarial Visual Level Domain Adaptation for Learning Video Object Detectors From Images

机译:从图像中学习视频对象检测器的无监督对抗视觉水平域自适应

获取原文

摘要

Deep learning based object detectors require thousands of diversified bounding box and class annotated examples. Though image object detectors have shown rapid progress in recent years with release of multiple large scale static image datasets, object detection on videos still remains an open problem due to scarcity of annotated video frames. Having a robust video object detector is an essential component for video understanding and curating large scale automated annotations in videos. Domain difference between images and videos makes the transferability of image object detectors to videos sub-optimal. The most common solution is to use weakly supervised annotations where a video frame has to be tagged for presence/absence of object categories. This still takes up manual effort. In this paper we take a step forward by adapting the concept of unsupervised adversarial image-to-image translation to perturb static high quality images to be visually indistinguishable from a set of video frames. We assume the presence of a fully annotated static image dataset and an unannotated video dataset. Object detector is trained on adversarially transformed image dataset using the annotations of the original dataset. Experiments on Youtube-Objects and Youtube-Objects-Subset datasets with two contemporary baseline object detectors reveal that such unsupervised pixel level domain adaptation boosts the generalization performance on video frames compared to direct application of original image object detector. Also, we achieve competitive performance compared to recent baselines of weakly supervised methods. This paper can be seen as an application of image translation for cross domain object detection. Codes available at https://github.com/avisekiit/wacv_2019.
机译:基于深度学习的对象检测器需要成千上万个多样化的边界框和类注释示例。尽管近年来随着多个大规模静态图像数据集的发布,图像对象检测器显示出快速的发展,但由于带注释的视频帧的稀缺性,视频对象检测仍然是一个未解决的问题。拥有强大的视频对象检测器是视频理解和策划视频中大规模自动注释的重要组成部分。图像和视频之间的域差异使得图像对象检测器到视频的可传递性不是最佳的。最常见的解决方案是使用弱监督的注释,其中必须标记视频帧的存在/不存在对象类别。这仍然需要人工。在本文中,我们通过将无监督对抗图像间转换的概念改编为扰动静态高质量图像,从而与一组视频帧在视觉上无法区分,从而向前迈进了一步。我们假设存在完全注释的静态图像数据集和未注释的视频数据集。使用原始数据集的注释,在对抗性变换后的图像数据集上训练对象检测器。使用两个当代基准对象检测器对Youtube-Objects和Youtube-Objects-Subset数据集进行的实验表明,与直接应用原始图像对象检测器相比,这种无监督的像素级域自适应提高了视频帧的泛化性能。此外,与弱监督方法的最新基准相比,我们获得了竞争优势。本文可以看作是图像翻译在跨域目标检测中的应用。可在https://github.com/avisekiit/wacv_2019上找到的代码。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号