首页> 外文会议>Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies >Discriminative Unsupervised Alignment of Natural Language Instructions with Corresponding Video Segments
【24h】

Discriminative Unsupervised Alignment of Natural Language Instructions with Corresponding Video Segments

机译:自然语言指令与相应视频段的可区分无监督对齐

获取原文

摘要

We address the problem of automatically aligning natural language sentences with corresponding video segments without any direct supervision. Most existing algorithms for integrating language with videos rely on hand-aligned parallel data, where each natural language sentence is manually aligned with its corresponding image or video segment. Recently, fully unsupervised alignment of text with video has been shown to be feasible using hierarchical generative models. In contrast to the previous generative models, we propose three latent-variable discriminative models for the unsupervised alignment task. The proposed discriminative models are capable of incorporating domain knowledge, by adding diverse and overlapping features. The results show that discriminative models outperform the generative models in terms of alignment accuracy.
机译:我们解决了在没有任何直接监督的情况下自动将自然语言句子与相应的视频片段对齐的问题。现有的大多数将语言与视频集成的算法都依赖于手动对齐的并行数据,其中每个自然语言句子都与相应的图像或视频段手动对齐。最近,已经证明,使用分层生成模型,完全无监督地将文本与视频对齐是可行的。与以前的生成模型相比,我们针对无监督对齐任务提出了三个潜在变量判别模型。所提出的判别模型能够通过添加多样化和重叠的功能来整合领域知识。结果表明,在对齐精度方面,判别模型优于生成模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号