首页> 外文学位 >Unsupervised Alignment of Natural Language with Video.
【24h】

Unsupervised Alignment of Natural Language with Video.

机译:自然语言与视频的无监督对齐。

获取原文
获取原文并翻译 | 示例

摘要

Today we encounter large amounts of video data, often accompanied with text descriptions (e.g., cooking videos and recipes, videos of wetlab experiments and protocols, movies and scripts). Extracting meaningful information from these multimodal sequences requires aligning the video frames with the corresponding sentences in the text. Previous methods for connecting language and videos relied on manual annotations, which are often tedious and expensive to collect. In this thesis, we focus on automatically aligning sentences with the corresponding video frames without any direct human supervision.;We first propose two hierarchical generative alignment models, which jointly align each sentence with the corresponding video frames, and each noun in a sentence with the corresponding object in the video frames. Next, we propose several latent-variable discriminative alignment models, which incorporate rich features involving verbs and video actions, and outperform the generative models. Our alignment algorithms are primarily applied to align biological wetlab videos with text instructions. Furthermore, we extend our alignment models for automatically aligning movie scenes with associated scripts and learning word-level translations between language pairs for which bilingual training data is unavailable.;Thesis: By exploiting the temporal ordering constraints between video and associated text, it is possible to automatically align the sentences in the text with the corresponding video frames without any direct human supervision.
机译:今天,我们遇到了大量的视频数据,通常伴随着文本描述(例如,烹饪视频和食谱,wetlab实验和协议的视频,电影和脚本)。从这些多模式序列中提取有意义的信息需要将视频帧与文本中的相应句子对齐。用于连接语言和视频的先前方法依赖于手动注释,这通常很乏味且收集起来很昂贵。在本文中,我们着重于在没有任何直接人工监督的情况下自动将句子与相应的视频帧对齐。我们首先提出了两个层次的生成对齐模型,它们将每个句子与相应的视频帧联合对齐,将句子中的每个名词与视频帧中的相应对象。接下来,我们提出了几种潜在变量的判别对齐模型,这些模型包含了涉及动词和视频动作的丰富功能,并且胜过了生成模型。我们的比对算法主要用于通过文本说明对生物湿实验室视频进行比对。此外,我们扩展了对齐模型,以自动将电影场景与相关的脚本进行对齐,并在无法使用双语培训数据的语言对之间学习单词级翻译;论文:通过利用视频与相关文本之间的时间顺序约束,可以实现自动将文本中的句子与相应的视频帧对齐,而无需任何人工监督。

著录项

  • 作者

    Naim, Iftekhar.;

  • 作者单位

    University of Rochester.;

  • 授予单位 University of Rochester.;
  • 学科 Computer science.;Artificial intelligence.
  • 学位 Ph.D.
  • 年度 2015
  • 页码 143 p.
  • 总页数 143
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号