Unsupervised Learning of Important Objects from First-Person Videos

机译：无监督从第一人称视频的重要对象的学习

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

A first-person camera, placed at a person's head, captures, which objects are important to the camera wearer. Most prior methods for this task learn to detect such important objects from the manually labeled first-person data in a supervised fashion. However, important objects are strongly related to the camera wearer's internal state such as his intentions and attention, and thus, only the person wearing the camera can provide the importance labels. Such a constraint makes the annotation process costly and limited in scalability. In this work, we show that we can detect important objects in first-person images without the supervision by the camera wearer or even third-person labelers. We formulate an important detection problem as an interplay between the 1) segmentation and 2) recognition agents. The segmentation agent first proposes a possible important object segmentation mask for each image, and then feeds it to the recognition agent, which learns to predict an important object mask using visual semantics and spatial features. We implement such an interplay between both agents via an alternating cross-pathway supervision scheme inside our proposed Visual-Spatial Network (VSN). Our VSN consists of spatial ("where") and visual ("what") pathways, one of which learns common visual semantics while the other focuses on the spatial location cues. Our unsupervised learning is accomplished via a cross-pathway supervision, where one pathway feeds its predictions to a segmentation agent, which proposes a candidate important object segmentation mask that is then used by the other pathway as a supervisory signal. We show our method's success on two different important object datasets, where our method achieves similar or better results as the supervised methods.

机译：一个第一人称相机，放置在一个人的头部，捕获，哪些物体对相机佩戴者很重要。对于此任务的大多数事先方法学会以监督方式检测手动标记的第一人称数据中的这些重要对象。然而，重要的物品与相机佩戴者的内部状态强烈相关，例如他的意图和注意力，因此，只有穿着相机的人才可以提供重要标签。这种约束使得注释过程昂贵和可扩展性限制。在这项工作中，我们表明我们可以在没有相机佩戴者甚至第三人称贴标商的情况下检测第一人称图像中的重要对象。我们在1）分段和2）识别代理之间的相互作用中制定一个重要的检测问题。分段代理首先提出每个图像的可能的重要对象分割掩模，然后将其馈送到识别代理，该识别代理学习使用视觉语义和空间特征来预测重要的对象掩码。我们通过我们所提出的视觉空间网络（VSN）内的交替的交叉路径监控方案在两个代理之间实现这种相互作用。我们的VSN由空间（“where”）和视觉（“什么”）途径，其中一个人学习公共视觉语义，而另一个侧重于空间位置提示。我们无监督的学习是通过交叉途径监督完成的，其中一种途径向分割剂馈送其预测，其提出了一种候选重要对象分割掩模，然后将其他通路用作监控信号。我们在两个不同重要的对象数据集中展示了我们的方法的成功，我们的方法可以实现类似或更好的结果作为监督方法。

著录项

来源
《IEEE International Conference on Computer Vision》|2017年|1491-2231p|共9页
会议地点
作者
Gedas Bertasius; Hyun Soo Park; Stella X. Yu; Jianbo Shi;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP391.41-53;
关键词

相似文献

外文文献
中文文献
专利

1. Unsupervised Primary Object Discovery in Videos Based on Evolutionary Primary Object Modeling With Reliable Object Proposals [J] . Yeong Jun Koh, Chang-Su Kim IEEE Transactions on Image Processing . 2017,第11期

机译：基于带有可靠对象提议的进化主对象建模的视频中无监督主对象发现
2. Unsupervised learning of depth estimation, camera motion prediction and dynamic object localization from video [J] . Delong Yang, Xunyu Zhong, Dongbing Gu, International Journal of Advanced Robotic Systems . 2020,第2期

机译：从视频中无监督学习深度估计，相机运动预测和来自视频的动态对象本地化
3. An Adaptive Unsupervised Neural Network Based on Perceptual Mechanism for Dynamic Object Detection in Videos with Real Scenarios [J] . Ramirez-Quintana Juan A., Chacon-Murguia Mario I. Neural processing letters . 2015,第3期

机译：基于感知机制的自适应无监督神经网络，用于真实场景视频中的动态目标检测
4. Unsupervised Learning of Important Objects from First-Person Videos [C] . Gedas Bertasius, Hyun Soo Park, Stella X. Yu, IEEE International Conference on Computer Vision . 2017

机译：无监督从第一人称视频的重要对象的学习
5. Unsupervised offline video object segmentation using object enhancement and region merging. [D] . Ryan, Ken. 2007

机译：使用对象增强和区域合并的无监督离线视频对象分割。
6. Learning Invariant Object and Spatial View Representations in the Brain Using Slow Unsupervised Learning [O] . Edmund T. Rolls 2021

机译：使用缓慢无监督学习学习大脑中的不变对象和空间视图表示
7. Unsupervised Learning of Important Objects from First-Person Videos [O] . Bertasius, Gedas, Park, Hyun Soo, Yu, Stella X., 2017

机译：从第一人称视频无监督学习重要对象

Unsupervised Learning of Important Objects from First-Person Videos

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅