首页> 外文会议>International joint conference on natural language processing >Guiding the Flowing of Semantics: Interpretable Video Captioning via POS Tag
【24h】

Guiding the Flowing of Semantics: Interpretable Video Captioning via POS Tag

机译:指导语义流动:通过POS标签解释视频字幕

获取原文

摘要

In the current video captioning models, the video frames are collected in one network and the semantics are mixed into one feature, which not only increase the difficulty of the caption decoding, but also decrease the inter-pretability of the captioning models. To address these problems, we propose an Adaptive Semantic Guidance Network (ASGN), which instantiates the whole video semantics to different POS-aware semantics with the supervision of part of speech (POS) tag. In the encoding process, the POS tag activates the related neurons and parses the whole semantic information into corresponding encoded video representations. Furthermore, the potential of the model is stimulated by the POS-aware video features. In the decoding process, the related video features of noun and verb are used as the supervision to construct a new adaptive attention model which can decide whether to attend to the video feature or not. With the explicit improving of the interpretability of the network, the learning process is more transparent and the results are more predictable. Extensive experiments demonstrate the effectiveness of our model when compared with state-of-the-art models.
机译:在当前的视频字幕模型中,在一个网络中收集视频帧,并将语义混合到一个特征中,这不仅增加了标题解码的难度,而且还降低了标题模型的可靠性。为了解决这些问题,我们提出了一个自适应语义指导网络(ASGN),其将整个视频语义实例到不同的POS感知语义,并通过对部分语音(POS)标签进行监督。在编码过程中,POS标签激活相关神经元并将整个语义信息解析成相应的编码视频表示。此外,模型的潜力由POS感知视频特征刺激。在解码过程中,名词和动词的相关视频特征用作构建新的自适应注意模型的监督,该模型可以决定是否参加视频特征。随着网络的解释性明确改善,学习过程更透明,结果更加透明。广泛的实验表明,与最先进的模型相比,我们模型的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号