首页> 外文会议>International Conference on Image, Vision and Computing >Generating video description with Long-Short Term Memory
【24h】

Generating video description with Long-Short Term Memory

机译:使用长期记忆生成视频描述

获取原文

摘要

Connecting visual imagery with visual descriptive language is a challenge for computer vision and machine translation. Inspired by image description, which used `encoder-decoder' model to translate image into target sentence. We propose an approach that can generate descriptions for video. Different from image which record the information in a moment, video have time-serials property. So when generating video description, we requires encoding dynamic temporal structure. Our model in this paper successfully takes into account both the global and local information. First, our approach extract the features of sample frames by a Convolutional Neural Network (CNN) which is pre-trained for image classification. Second, we get the global feature of video by max pooling the features of frames. Third, we divide the Long-Short Term Memory (LSTM) into two parts, one of which encode the features of frames into local feature, another decode the features which contains global and local information into target sentence. Finally, we compare two variants of our model with recent works using BLEU metrics on YouTube dataset.
机译:将视觉图像与视觉描述性语言联系起来对于计算机视觉和机器翻译是一个挑战。受图像描述的启发,图像描述使用“编码器-解码器”模型将图像转换为目标句子。我们提出了一种可以生成视频描述的方法。视频不同于即时记录信息的图像,它具有时间序列属性。因此,在生成视频描述时,我们需要对动态时间结构进行编码。本文中的模型成功地考虑了全球和本地信息。首先,我们的方法通过预先训练用于图像分类的卷积神经网络(CNN)提取样本帧的特征。其次,我们通过最大程度地合并帧的特征来获得视频的全局特征。第三,我们将长期记忆(LSTM)分为两部分,其中一部分将帧的特征编码为局部特征,另一部分将包含全局信息和局部信息的特征解码为目标句子。最后,我们将模型的两个变体与使用YouTube数据集上的BLEU指标的最新作品进行比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号