首页> 中文期刊>中国通信 >Video Description with Integrated Visual and Textual Information

Video Description with Integrated Visual and Textual Information

     

摘要

Video Description aims to automatically generate descriptive natural language for videos.Due to the large volume of multi-modal data and successful implementations of Deep Neural Networks(DNNs),a wide range of models have been proposed.However,previous models learn insufficient linguistic information or correlation between visual and textual modalities.In order to address those problems,this paper proposes an integrated model using Long Short-Term Memory(LSTM).This proposed model consists of triple channels in parallel:a primary video description channel,a sentence-to-sentence channel for language learning,and a channel to integrate visual and textual information.Additionally,the parallel three channels are connected by LSTM weight matrices during training.The VD-ivt model is evaluated on two publicly available datasets,i.e.Youtube2 Text and LSMDC.Experimental results demonstrate that the performance of the proposed model outperforms those benchmarks.

著录项

  • 来源
    《中国通信》|2019年第1期|119-128|共10页
  • 作者单位

    Beijing University of Posts and Telecommunications Beijing 100876 China;

    Beijing University of Posts and Telecommunications Beijing 100876 China;

    Beijing University of Posts and Telecommunications Beijing 100876 China;

  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号