首页> 外文学位 >Discovering audio-visual associations in narrated videos of human activities.

【24h】

Discovering audio-visual associations in narrated videos of human activities.

机译：在人类活动的叙述视频中发现视听关联。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

This research presents a novel method for learning the lexical semantics of action verbs. The primary focus is on actions that are directed towards objects, such as kicking a ball or pushing a chair. Specifically, this dissertation presents a robust and scalable method for acquiring grounded lexical semantics by discovering audio-visual associations in narrated videos. The narration associated with the video contains many words, including other verbs that are unrelated to the action. The actual name of the depicted action is only occasionally mentioned by the narrator. More generally, this research presents an algorithm that can reliably and autonomously discover an association between two events, such as the utterance of a verb and the depiction of an action, if the two events are only loosely correlated with each other.; Semantics is represented in a grounded way by association sets, a collection of sensory inputs associated with a high level concept. Each association set associates video sequences that depict a given action with utterances of the name of the action. The association sets are discovered in an unsupervised way. This dissertation also shows how to extract features from the video and audio for this purpose.; Extensive experimental results are presented. The experiments make use of several hours of video depicting a human performing 13 actions with 6 objects. In addition, the performance of the algorithm was also tested with data provided by an external research group. The unsupervised learning algorithm presented in this dissertation has been compared to standard supervised learning algorithms. This dissertation introduces a number of relevant experimental parameters and various new analysis techniques.; The experimental results show that the algorithm presented in this dissertation successfully discovers the correct associations between video scenes and audio utterances in an unsupervised way despite the imperfect correlation between the video and audio. The algorithm outperforms standard supervised learning algorithms. Among other things, this research shows that the performance of the algorithm depends mainly on the strength of the correlation between video and audio, the length of the narration associated with each video scene and the total number of words in the language.

机译：这项研究提出了一种新颖的方法来学习动作动词的词汇语义。主要重点是针对物体的动作，例如踢球或推椅子。具体而言，本文提出了一种健壮且可扩展的方法，通过发现叙述视频中的视听关联来获取扎实的词汇语义。与视频相关的旁白包含许多单词，包括与动作无关的其他动词。叙述者仅偶尔提到所描绘动作的实际名称。更笼统地说，这项研究提出了一种算法，如果两个事件之间只是松散地相互关联，则该算法可以可靠且自主地发现两个事件之间的关联，例如动词的发声和动作的描述。语义通过关联集以扎根的方式表示，关联集是与高级概念关联的感觉输入的集合。每个关联集将描述给定动作的视频序列与动作名称的发音相关联。关联集是在无监督的情况下发现的。本文还展示了如何为此目的从视频和音频中提取特征。提出了广泛的实验结果。实验利用了几个小时的视频，这些视频描绘了一个人用6个物体执行13个动作。此外，还使用外部研究小组提供的数据对算法的性能进行了测试。将本文提出的无监督学习算法与标准的有监督学习算法进行了比较。本文介绍了许多相关的实验参数和各种新的分析技术。实验结果表明，尽管视频和音频之间的关联不完善，但本文提出的算法仍能以无监督的方式成功地发现了视频场景和音频话语之间的正确关联。该算法优于标准的监督学习算法。除其他外，这项研究表明，算法的性能主要取决于视频和音频之间的关联强度，与每个视频场景相关的旁白长度以及语言中的单词总数。

著录项

作者
Oezer, Tuna.;
展开▼
作者单位

University of Illinois at Urbana-Champaign.;

展开▼
授予单位 University of Illinois at Urbana-Champaign.;
学科 Computer Science.
学位 Ph.D.
年度 2008
页码 143 p.
总页数 143
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Discovering joint audio-visual codewords for video event detection [J] . I-Hong Jhuo, Guangnan Ye, Shenghua Gao, Machine Vision and Applications . 2014,第1期

机译：发现用于视频事件检测的联合视听代码字
2. An audio-visual human attention analysis approach to abrupt change detection in videos [J] . Yanxiang Chen, Minglong Song, Lixia Xue, Signal processing . 2015,第may期

机译：视听人类注意力分析方法，用于视频中的突然变化检测
3. Automated video analysis of pig activity at pen level highly correlates to human observations of behavioural activities. [J] . Ott S., Moons C. P. H., Kashiha M. A., Livestock Science . 2014,第Null期

机译：笔级猪活动的自动视频分析与人类对行为活动的观察高度相关。
4. Teaching a Humanoid Robot: Headset-Free Speech Interaction for Audio-Visual Association Learning [C] . Martin Heckmann, Holger Brandl, Jens Schmuedderich, International Symposium on Robot and Human Interactive Communication . 2009

机译：教授人形机器人：用于视听关联学习的无耳机语音交互
5. Effective temporal video segmentation and content-based audio-visual video clustering. [D] . Kang, Jung Won. 2003

机译：有效的时间视频分割和基于内容的视听视频聚类。
6. A Genome-wide Association Study Discovers 46 Loci of the Human Metabolome in the Hispanic Community Health Study/Study of Latinos [O] . Elena V. Feofanova, Han Chen, Yulin Dai, 2020

机译：基因组 - 范围协会研究发现了西班牙裔社区卫生学习/拉丁裔社区卫生学习/研究中的人类代谢物的46个基因座
7. Discovering Audio-Visual Associations in Narrated Videos of Human Activities [O] . Oezer Tuna 2008

机译：发现人类活动叙事视频中的视听联想

Discovering audio-visual associations in narrated videos of human activities.

摘要

著录项

相似文献

相关主题

期刊订阅