首页> 外文会议>Multi-disciplinary international workshop on artificial intelligence >Learning Generalized Video Memory for Automatic Video Captioning
【24h】

Learning Generalized Video Memory for Automatic Video Captioning

机译:学习通用视频存储器以进行自动视频字幕

获取原文

摘要

Recent video captioning methods have made great progress by deep learning approaches with convolutional neural networks (CNN) and recurrent neural networks (RNN). While there are techniques that use memory networks for sentence decoding, few work has leveraged on the memory component to learn and generalize the temporal structure in video. In this paper, we propose a new method, namely Generalized Video Memory (GVM), utilizing a memory model for enhancing video description generation. Based on a class of self-organizing neural networks, GVM's model is able to learn new video features incrementally. The learned generalized memory is further exploited to decode the associated sentences using RNN. We evaluate our method on the YouTube2Text data set using BLEU and METEOR scores as a standard benchmark. Our results are shown to be competitive against other state-of-the-art methods.
机译:通过卷积神经网络(CNN)和递归神经网络(RNN)的深度学习方法,最近的视频字幕方法取得了长足的进步。尽管有一些使用内存网络进行句子解码的技术,但很少有工作可以利用内存组件来学习和概括视频中的时间结构。在本文中,我们提出了一种新的方法,即通用视频内存(GVM),该方法利用内存模型来增强视频描述的生成。基于一类自组织神经网络,GVM的模型能够逐步学习新的视频功能。进一步利用学习到的广义记忆,使用RNN对关联的句子进行解码。我们使用BLEU和METEOR得分作为标准基准,在YouTube2Text数据集上评估我们的方法。结果表明,我们的结果与其他最新方法相比具有竞争力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号