首页> 外文期刊>ACM transactions on multimedia computing communications and applications >Rich Visual and Language Representation with Complementary Semantics for Video Captioning
【24h】

Rich Visual and Language Representation with Complementary Semantics for Video Captioning

机译:丰富的视觉和语言表示以及带有辅助语义的视频字幕

获取原文
获取原文并翻译 | 示例

摘要

It is interesting and challenging to translate a video to natural description sentences based on the video content. In this work, an advanced framework is built to generate sentences with coherence and rich semantic expressions for video captioning. A long short term memory (LSTM) network with an unproved factored way is first developed, which takes the inspiration of LSTM with a conventional factored way and a common practice to feed multi-modal features into LSTM at the first time step for visual description. Then, the incorporation of the LSTM network with the proposed improved factored way and un-factored way is exploited, and a voting strategy is utilized to predict candidate words. In addition, for robust and abstract visual and language representation, residuals are employed to enhance the gradient signals that are learned from the residual network (ResNet), and a deeper LSTM network is constructed. Furthermore, three convolutional neural network based features extracted from GoogLeNet, ResNet101, and ResNet152, are fused to catch more comprehensive and complementary visual information. Experiments are conducted on two benchmark datasets, including MSVD and MSR-VTT2016, and competitive performances are obtained by the proposed techniques as compared to other state-of-the-art methods.
机译:将视频转换为基于视频内容的自然描述语句既有趣又具有挑战性。在这项工作中,构建了一个高级框架来生成具有连贯性和丰富语义表达的句子,用于视频字幕。首先开发了一种未经证实的分解方式的长期短期记忆(LSTM)网络,该网络从LSTM的灵感中汲取了传统的分解方式,并且是在第一步中将多模式特征输入LSTM进行可视化描述的一种惯例。然后,将LSTM网络与所提出的改进的分解和非分解方法结合起来,并利用一种投票策略来预测候选单词。此外,为了获得健壮和抽象的视觉和语言表示,使用残差来增强从残差网络(ResNet)获悉的梯度信号,并构建更深的LSTM网络。此外,融合了从GoogLeNet,ResNet101和ResNet152中提取的三个基于卷积神经网络的特征,以捕获更全面和互补的视觉信息。在包括MSVD和MSR-VTT2016在内的两个基准数据集上进行了实验,与其他最新方法相比,通过拟议技术获得了竞争性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号