【24h】

Guiding the Flowing of Semantics: Interpretable Video Captioning via POS Tag

机译:引导语义流:通过POS标签可解释的视频字幕

获取原文

摘要

In the current video captioning models, the video frames are collected in one network and the semantics are mixed into one feature, which not only increase the difficulty of the caption decoding, but also decrease the inter-pretability of the captioning models. To address these problems, we propose an Adaptive Semantic Guidance Network (ASGN), which instantiates the whole video semantics to different POS-aware semantics with the supervision of part of speech (POS) tag. In the encoding process, the POS tag activates the related neurons and parses the whole semantic information into corresponding encoded video representations. Furthermore, the potential of the model is stimulated by the POS-aware video features. In the decoding process, the related video features of noun and verb are used as the supervision to construct a new adaptive attention model which can decide whether to attend to the video feature or not. With the explicit improving of the interpretability of the network, the learning process is more transparent and the results are more predictable. Extensive experiments demonstrate the effectiveness of our model when compared with state-of-the-art models.
机译:在当前的视频字幕模型中,视频帧被收集在一个网络中并且语义被混合为一个特征,这不仅增加了字幕解码的难度,而且降低了字幕模型的可解释性。为了解决这些问题,我们提出了一种自适应语义指导网络(ASGN),该系统可在词性(POS)标签的监督下将整个视频语义实例化为不同的POS感知语义。在编码过程中,POS标签激活相关的神经元并将整个语义信息解析为相应的编码视频表示形式。此外,支持POS的视频功能激发了该模型的潜力。在解码过程中,以名词和动词的相关视频特征为监督,构建了新的自适应注意力模型,该模型可以决定是否参与视频特征。随着网络可解释性的显着提高,学习过程更加透明,结果更加可预测。与最新模型相比,大量实验证明了我们模型的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号