首页> 外文会议>IEEE International Conference on Computer Vision >Unsupervised Learning of Important Objects from First-Person Videos
【24h】

Unsupervised Learning of Important Objects from First-Person Videos

机译:无监督从第一人称视频的重要对象的学习

获取原文
获取外文期刊封面目录资料

摘要

A first-person camera, placed at a person's head, captures, which objects are important to the camera wearer. Most prior methods for this task learn to detect such important objects from the manually labeled first-person data in a supervised fashion. However, important objects are strongly related to the camera wearer's internal state such as his intentions and attention, and thus, only the person wearing the camera can provide the importance labels. Such a constraint makes the annotation process costly and limited in scalability. In this work, we show that we can detect important objects in first-person images without the supervision by the camera wearer or even third-person labelers. We formulate an important detection problem as an interplay between the 1) segmentation and 2) recognition agents. The segmentation agent first proposes a possible important object segmentation mask for each image, and then feeds it to the recognition agent, which learns to predict an important object mask using visual semantics and spatial features. We implement such an interplay between both agents via an alternating cross-pathway supervision scheme inside our proposed Visual-Spatial Network (VSN). Our VSN consists of spatial ("where") and visual ("what") pathways, one of which learns common visual semantics while the other focuses on the spatial location cues. Our unsupervised learning is accomplished via a cross-pathway supervision, where one pathway feeds its predictions to a segmentation agent, which proposes a candidate important object segmentation mask that is then used by the other pathway as a supervisory signal. We show our method's success on two different important object datasets, where our method achieves similar or better results as the supervised methods.
机译:一个第一人称相机,放置在一个人的头部,捕获,哪些物体对相机佩戴者很重要。对于此任务的大多数事先方法学会以监督方式检测手动标记的第一人称数据中的这些重要对象。然而,重要的物品与相机佩戴者的内部状态强烈相关,例如他的意图和注意力,因此,只有穿着相机的人才可以提供重要标签。这种约束使得注释过程昂贵和可扩展性限制。在这项工作中,我们表明我们可以在没有相机佩戴者甚至第三人称贴标商的情况下检测第一人称图像中的重要对象。我们在1)分段和2)识别代理之间的相互作用中制定一个重要的检测问题。分段代理首先提出每个图像的可能的重要对象分割掩模,然后将其馈送到识别代理,该识别代理学习使用视觉语义和空间特征来预测重要的对象掩码。我们通过我们所提出的视觉空间网络(VSN)内的交替的交叉路径监控方案在两个代理之间实现这种相互作用。我们的VSN由空间(“where”)和视觉(“什么”)途径,其中一个人学习公共视觉语义,而另一个侧重于空间位置提示。我们无监督的学习是通过交叉途径监督完成的,其中一种途径向分割剂馈送其预测,其提出了一种候选重要对象分割掩模,然后将其他通路用作监控信号。我们在两个不同重要的对象数据集中展示了我们的方法的成功,我们的方法可以实现类似或更好的结果作为监督方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号