首页> 外文期刊>Journal of visual communication & image representation >Translating video into language by enhancing visual and language representations
【24h】

Translating video into language by enhancing visual and language representations

机译:通过增强视觉和语言表示将视频转换为语言

获取原文
获取原文并翻译 | 示例

摘要

It is a fundamental task of translating videos into natural language automatically by computer. At present, the models for video description based on deep learning have made a great breakthrough. However, the static information loss is serious during encoding stage for motion feature of videos, and the linguistic feature from LSTM network lack personalized expression, leading to inappropriate words and poor semantics in generation sentences. In this work, a model with enhanced features of visual and language is proposed to address the challenges. First, static features of video frames from the first LSTM layer are incorporated, then fed into another LSTM layer according by frame sequence. Second, the feature of word is combined with the output of LSTM network for predicted probability of candidate word on each time step. The experimental results demonstrate effectiveness of the proposed approach with competitive performance compared with other state-of-the-art methods on various metrics.
机译:它是通过计算机自动将视频转化为自然语言的基本任务。目前,基于深度学习的视频描述模型取得了很大的突破。然而,在视频的运动特征的编码阶段期间,静态信息丢失是严重的,并且来自LSTM网络的语言特征缺乏个性化表达,导致一代句子中的不恰当的单词和差的语义。在这项工作中,提出了一种具有增强的视觉和语言特征的模型来解决挑战。首先,掺入来自第一LSTM层的视频帧的静态特征,然后根据帧顺序馈入另一个LSTM层。其次,单词的特征与LSTM网络的输出相结合,以便在每次步骤上预测候选词的预测概率。实验结果表明,与各种指标的其他最新方法相比,拟议方法的有效性与各种指标相比。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号