首页> 外文会议>International Conference on Image, Vision and Computing >Generating video description with Long-Short Term Memory
【24h】

Generating video description with Long-Short Term Memory

机译:生成带有长期内存的视频描述

获取原文

摘要

Connecting visual imagery with visual descriptive language is a challenge for computer vision and machine translation. Inspired by image description, which used `encoder-decoder' model to translate image into target sentence. We propose an approach that can generate descriptions for video. Different from image which record the information in a moment, video have time-serials property. So when generating video description, we requires encoding dynamic temporal structure. Our model in this paper successfully takes into account both the global and local information. First, our approach extract the features of sample frames by a Convolutional Neural Network (CNN) which is pre-trained for image classification. Second, we get the global feature of video by max pooling the features of frames. Third, we divide the Long-Short Term Memory (LSTM) into two parts, one of which encode the features of frames into local feature, another decode the features which contains global and local information into target sentence. Finally, we compare two variants of our model with recent works using BLEU metrics on YouTube dataset.
机译:使用视觉描述性语言连接可视图像是计算机视觉和机器翻译的挑战。灵感来自图像描述,它使用了“编码器 - 解码器”模型将图像转换为目标句子。我们提出一种可以生成视频描述的方法。与图像中的图像不同,视频具有时间序列性的属性。因此,在生成视频描述时,我们需要编码动态时间结构。我们本文的模型成功考虑了全局和本地信息。首先,我们的方法通过卷积神经网络(CNN)提取样品帧的特征,该卷积神经网络(CNN)是预先培训的图像分类。其次,通过Max汇集帧的功能,我们获得视频的全局特征。三,我们将长短短期内存(LSTM)划分为两个部分,其中一个将帧的功能编码为本地特征,另一个将包含全局和本地信息的特征解码为目标句子。最后,我们使用YouTube数据集上的Bleu指标比较我们模型的两个变体。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号