首页> 外文会议>Annual conference on Neural Information Processing Systems >Action is in the Eye of the Beholder: Eye-gaze Driven Model for Spatio-Temporal Action Localization
【24h】

Action is in the Eye of the Beholder: Eye-gaze Driven Model for Spatio-Temporal Action Localization

机译:行动是在旁观者的眼中:眼睛凝视驱动的时空作用定位模型

获取原文
获取外文期刊封面目录资料

摘要

We propose a weakly-supervised structured learning approach for recognition and spatio-temporal localization of actions in video. As part of the proposed approach, we develop a generalization of the Max-Path search algorithm which allows us to efficiently search over a structured space of multiple spatio-temporal paths while also incorporating context information into the model. Instead of using spatial annotations in the form of bounding boxes to guide the latent model during training, we utilize human gaze data in the form of a weak supervisory signal. This is achieved by incorporating eye gaze, along with the classification, into the structured loss within the latent SVM learning framework. Experiments on a challenging benchmark dataset, UCF-Sports, show that our model is more accurate, in terms of classification, and achieves state-of-the-art results in localization. In addition, our model can produce top-down saliency maps conditioned on the classification label and localized latent paths.
机译:我们提出了一种弱监督的结构化学习方法,可用于视频中行动的识别和时空定位。作为所提出的方法的一部分,我们开发了MAX-PATL搜索算法的概括,其允许我们有效地搜索多个时空路径的结构化空间,同时还将上下文信息结合到模型中。而不是使用边界框形式的空间注释来指导培训期间的潜在模型,我们利用弱监管信号的形式使用人的凝视数据。这是通过将眼睛凝视与分类结合到潜伏的SVM学习框架内的结构化损失中来实现的。在一个具有挑战性的基准数据集,UCF运动的实验表明,在分类方面,我们的模型更准确,并实现了最先进的本地化。此外,我们的模型可以在分类标签和本地化潜在路径上产生自上而下的显着性图。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号