首页> 外文OA文献 >Object Referring in Videos with Language and Human Gaze
【2h】

Object Referring in Videos with Language and Human Gaze

机译:对象引用语言和人类凝视的视频

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

We investigate the problem of object referring (OR) i.e. to localize a targetobject in a visual scene coming with a language description. Humans perceivethe world more as continued video snippets than as static images, and describeobjects not only by their appearance, but also by their temporal-spatialcontexts and motion features. Humans also gaze at the object when they issue areferring expression. Existing works for OR mostly focus on static images only,which fall short in providing many such cues. This paper addresses OR in videoswith language and human gaze. To that end, we present a new video dataset forOR, with 30, 000 objects over 5, 000 stereo video sequences annotated for theirdescriptions and gaze. We further propose a novel network model for OR invideos, by integrating appearance, motion, gaze, and spatial-temporalcontextual information all into one network. Experimental results shows thatour method effectively utilizes motion cues, human gaze, and spatial-temporalcontext information. Our method outperforms previous OR methods. The datasetand code will be made available.
机译:我们调查了对象的问题参考(或)即,在具有语言描述的视觉场景中本地化TargetObject。人类可以易于持续的视频片段而不是静态图像,不仅通过它们的外观来设计,而且还通过它们的时间 - 时空联网和运动功能。当他们发出必要的表达时,人类也凝视着对象。现有的工作或主要关注静态图像,这在提供许多此类提示方面缺乏。本文的地址或视频和人类凝视中的地址。为此,我们展示了一个新的视频数据集福尔,超过5,000个超过5,000个立体声视频序列的对象,用于他们的探测器和凝视。我们通过将外观,运动,凝视和空间 - 临时信息集成到一个网络中,我们进一步提出了一种新的网络模型或vidide。实验结果表明,大家方法有效地利用运动提示,人体凝视和空间 - 临时文本信息。我们的方法优于前一个或方法。将提供数据集和代码。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号