Object Referring in Videos with Language and Human Gaze

机译：对象引用语言和人类凝视的视频

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

We investigate the problem of object referring (OR) i.e. to localize a targetobject in a visual scene coming with a language description. Humans perceivethe world more as continued video snippets than as static images, and describeobjects not only by their appearance, but also by their temporal-spatialcontexts and motion features. Humans also gaze at the object when they issue areferring expression. Existing works for OR mostly focus on static images only,which fall short in providing many such cues. This paper addresses OR in videoswith language and human gaze. To that end, we present a new video dataset forOR, with 30, 000 objects over 5, 000 stereo video sequences annotated for theirdescriptions and gaze. We further propose a novel network model for OR invideos, by integrating appearance, motion, gaze, and spatial-temporalcontextual information all into one network. Experimental results shows thatour method effectively utilizes motion cues, human gaze, and spatial-temporalcontext information. Our method outperforms previous OR methods. The datasetand code will be made available.

机译：我们调查了对象的问题参考（或）即，在具有语言描述的视觉场景中本地化TargetObject。人类可以易于持续的视频片段而不是静态图像，不仅通过它们的外观来设计，而且还通过它们的时间 - 时空联网和运动功能。当他们发出必要的表达时，人类也凝视着对象。现有的工作或主要关注静态图像，这在提供许多此类提示方面缺乏。本文的地址或视频和人类凝视中的地址。为此，我们展示了一个新的视频数据集福尔，超过5,000个超过5,000个立体声视频序列的对象，用于他们的探测器和凝视。我们通过将外观，运动，凝视和空间 - 临时信息集成到一个网络中，我们进一步提出了一种新的网络模型或vidide。实验结果表明，大家方法有效地利用运动提示，人体凝视和空间 - 临时文本信息。我们的方法优于前一个或方法。将提供数据集和代码。

著录项

作者
Arun Balajee Vasudevan; Dengxin Dai; Luc Van Gool;
展开▼
作者单位

展开▼
年度 2018
总页数
原文格式 PDF
正文语种
中图分类
入库时间 2022-08-20 22:18:48

相似文献

外文文献
中文文献
专利

1. Do Objects Have A Spatial “Inside”?, SR Tends To Mistake Deconstruction For Nominalism, Subjectivism And Meillassoux's Correlationism, Language Is Radically Nonhuman--Even When Humans Use It, Quantum Theory Positively Guarantees That Real Objects Exist! [J] . K.N.P. Kumar, C.J. Prabhakara, S.K. Narasimha Murthy, Advances in Physics Theories and Applications . 2013,第1期

机译：Do对象是否有空间“内部”？，SR倾向于错误地误解名义主义，主观主义和Meillassoux的相关主义，语言是根本性的非人 - 即使人类使用它，昆腾理论也积极保证存在真实的物体！
2. Individual differences in object-processing explain the relationship between early gaze-following and later language development [J] . Okumura Yuko, Kanakogi Yasuhiro, Kobayashi Tessei, Cognition: International Journal of Cognitive Psychology . 2017,第期

机译：对象处理的个体差异解释了早期凝视跟随和后来语言开发之间的关系
3. Objects tracking in video: a object-oriented approach using Unified Modeling Language [J] . Nilesh J. Uke, Ravindra C. Thool International journal of computational vision and robotics . 2015,第2期

机译：视频中的对象跟踪：使用统一建模语言的面向对象方法
4. Object Referring in Videos with Language and Human Gaze [C] . Arun Balajee Vasudevan, Dengxin Dai, Luc Van Gool IEEE/CVF Conference on Computer Vision and Pattern Recognition . 2018

机译：具有语言和人眼注视的视频中的对象引用
5. Multimodal Learning with Minimal Human Supervision from Videos and Natural Language [D] . Xiao, Fanyi. 2020

机译：来自视频和自然语言的最小人类监督的多式化学习
6. Referring strategies in American Sign Language and English (with co-speech gesture): The role of modality in referring to non-nameable objects [O] . ZED SEVCIKOVA SEHYR, BRENDA NICODEMUS, JENNIFER PETRICH, -1

机译：美式手语和英语（带有同声手势）的引用策略：情态在引用不可命名对象中的作用
7. Video Object Segmentation with Language Referring Expressions [O] . Anna Khoreva, Anna Rohrbach, Bernt Schiele 2019

机译：具有语言引用表达式的视频对象分段
8. Integrating Language and Vision to Generate Natural Language Descriptions of Videos in the Wild. [R] . Thomason, J., Venugopalan, S., Guadarrama, S., 2014

机译：整合语言和视觉，生成自然语言对野外视频的描述。

Object Referring in Videos with Language and Human Gaze

摘要

著录项

相似文献

相关主题

期刊订阅