首页> 外文会议>European conference on computer vision >In the Eye of Beholder: Joint Learning of Gaze and Actions in First Person Video
【24h】

In the Eye of Beholder: Joint Learning of Gaze and Actions in First Person Video

机译:在情人眼中:第一人称视频中凝视与动作的共同学习

获取原文

摘要

We address the task of jointly determining what a person is doing and where they are looking based on the analysis of video captured by a headworn camera. We propose a novel deep model for joint gaze estimation and action recognition in First Person Vision. Our method describes the participant's gaze as a probabilistic variable and models its distribution using stochastic units in a deep network. We sample from these stochastic units to generate an attention map. This attention map guides the aggregation of visual features in action recognition, thereby providing coupling between gaze and action. We evaluate our method on the standard EGTEA dataset and demonstrate performance that exceeds the state-of-the-art by a significant margin of 3.5%.
机译:我们基于对头戴式摄像机拍摄的视频进行分析,共同确定一个人在做什么和在看什么的任务。我们提出了一种新颖的深度模型,用于“第一人称”视觉中的联合注视估计和动作识别。我们的方法将参与者的凝视描述为一个概率变量,并使用深度网络中的随机单位来模拟其分布。我们从这些随机单位中采样以生成注意力图。该注意图指导动作识别中视觉特征的聚集,从而提供凝视与动作之间的耦合。我们在标准EGTEA数据集上评估了我们的方法,并证明其性能比最新技术高出3.5%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号