首页> 外文学位 >Discovering audio-visual associations in narrated videos of human activities.
【24h】

Discovering audio-visual associations in narrated videos of human activities.

机译:在人类活动的叙述视频中发现视听关联。

获取原文
获取原文并翻译 | 示例

摘要

This research presents a novel method for learning the lexical semantics of action verbs. The primary focus is on actions that are directed towards objects, such as kicking a ball or pushing a chair. Specifically, this dissertation presents a robust and scalable method for acquiring grounded lexical semantics by discovering audio-visual associations in narrated videos. The narration associated with the video contains many words, including other verbs that are unrelated to the action. The actual name of the depicted action is only occasionally mentioned by the narrator. More generally, this research presents an algorithm that can reliably and autonomously discover an association between two events, such as the utterance of a verb and the depiction of an action, if the two events are only loosely correlated with each other.; Semantics is represented in a grounded way by association sets, a collection of sensory inputs associated with a high level concept. Each association set associates video sequences that depict a given action with utterances of the name of the action. The association sets are discovered in an unsupervised way. This dissertation also shows how to extract features from the video and audio for this purpose.; Extensive experimental results are presented. The experiments make use of several hours of video depicting a human performing 13 actions with 6 objects. In addition, the performance of the algorithm was also tested with data provided by an external research group. The unsupervised learning algorithm presented in this dissertation has been compared to standard supervised learning algorithms. This dissertation introduces a number of relevant experimental parameters and various new analysis techniques.; The experimental results show that the algorithm presented in this dissertation successfully discovers the correct associations between video scenes and audio utterances in an unsupervised way despite the imperfect correlation between the video and audio. The algorithm outperforms standard supervised learning algorithms. Among other things, this research shows that the performance of the algorithm depends mainly on the strength of the correlation between video and audio, the length of the narration associated with each video scene and the total number of words in the language.
机译:这项研究提出了一种新颖的方法来学习动作动词的词汇语义。主要重点是针对物体的动作,例如踢球或推椅子。具体而言,本文提出了一种健壮且可扩展的方法,通过发现叙述视频中的视听关联来获取扎实的词汇语义。与视频相关的旁白包含许多单词,包括与动作无关的其他动词。叙述者仅偶尔提到所描绘动作的实际名称。更笼统地说,这项研究提出了一种算法,如果两个事件之间只是松散地相互关联,则该算法可以可靠且自主地发现两个事件之间的关联,例如动词的发声和动作的描述。语义通过关联集以扎根的方式表示,关联集是与高级概念关联的感觉输入的集合。每个关联集将描述给定动作的视频序列与动作名称的发音相关联。关联集是在无监督的情况下发现的。本文还展示了如何为此目的从视频和音频中提取特征。提出了广泛的实验结果。实验利用了几个小时的视频,这些视频描绘了一个人用6个物体执行13个动作。此外,还使用外部研究小组提供的数据对算法的性能进行了测试。将本文提出的无监督学习算法与标准的有监督学习算法进行了比较。本文介绍了许多相关的实验参数和各种新的分析技术。实验结果表明,尽管视频和音频之间的关联不完善,但本文提出的算法仍能以无监督的方式成功地发现了视频场景和音频话语之间的正确关联。该算法优于标准的监督学习算法。除其他外,这项研究表明,算法的性能主要取决于视频和音频之间的关联强度,与每个视频场景相关的旁白长度以及语言中的单词总数。

著录项

  • 作者

    Oezer, Tuna.;

  • 作者单位

    University of Illinois at Urbana-Champaign.;

  • 授予单位 University of Illinois at Urbana-Champaign.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2008
  • 页码 143 p.
  • 总页数 143
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号