Guiding the Flowing of Semantics: Interpretable Video Captioning via POS Tag

机译：指导语义流动：通过POS标签解释视频字幕

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In the current video captioning models, the video frames are collected in one network and the semantics are mixed into one feature, which not only increase the difficulty of the caption decoding, but also decrease the inter-pretability of the captioning models. To address these problems, we propose an Adaptive Semantic Guidance Network (ASGN), which instantiates the whole video semantics to different POS-aware semantics with the supervision of part of speech (POS) tag. In the encoding process, the POS tag activates the related neurons and parses the whole semantic information into corresponding encoded video representations. Furthermore, the potential of the model is stimulated by the POS-aware video features. In the decoding process, the related video features of noun and verb are used as the supervision to construct a new adaptive attention model which can decide whether to attend to the video feature or not. With the explicit improving of the interpretability of the network, the learning process is more transparent and the results are more predictable. Extensive experiments demonstrate the effectiveness of our model when compared with state-of-the-art models.

机译：在当前的视频字幕模型中，在一个网络中收集视频帧，并将语义混合到一个特征中，这不仅增加了标题解码的难度，而且还降低了标题模型的可靠性。为了解决这些问题，我们提出了一个自适应语义指导网络（ASGN），其将整个视频语义实例到不同的POS感知语义，并通过对部分语音（POS）标签进行监督。在编码过程中，POS标签激活相关神经元并将整个语义信息解析成相应的编码视频表示。此外，模型的潜力由POS感知视频特征刺激。在解码过程中，名词和动词的相关视频特征用作构建新的自适应注意模型的监督，该模型可以决定是否参加视频特征。随着网络的解释性明确改善，学习过程更透明，结果更加透明。广泛的实验表明，与最先进的模型相比，我们模型的有效性。

著录项

来源
《International joint conference on natural language processing》|2019年|cxxxviii p. 1941-2589|共10页
会议地点
作者
Xinyu Xiao; Lingfeng Wang; Bin Fan; Shiming Xiang; Chunhong Pan;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词

相似文献

外文文献
中文文献
专利

1. Fused GRU with semantic-temporal attention for video captioning [J] . Gao Lianli, Wang Xuanhan, Song Jingkuan, Neurocomputing . 2020,第Juna28期

机译：融合GRU与视频标题的语义关注
2. High-Quality Image Captioning With Fine-Grained and Semantic-Guided Visual Attention [J] . Zhang Zongjian, Wu Qiang, Wang Yang, IEEE transactions on multimedia . 2019,第7期

机译：具有细粒度和语义引导的视觉注意的高质量图像字幕
3. Semantic-filtered Soft-Split-Aware video captioning with audio-augmented feature [J] . Xu Yuecong, Yang Jianfei, Mao Kezhi Neurocomputing . 2019,第SEPa10期

机译：具有音频增强功能的语义过滤的“软分割感知”视频字幕
4. Guiding the Flowing of Semantics: Interpretable Video Captioning via POS Tag [C] . Xinyu Xiao, Lingfeng Wang, Bin Fan, International joint conference on natural language processing;Conference on empirical methods in natural language processing . 2019

机译：引导语义流：通过POS标签可解释的视频字幕
5. An enlightened eye and an inquiring mind: Guided video interactions to develop interpretive skills and intellectual modesty. [D] . Preston, Michael D. 2010

机译：开明的眼睛和好奇的头脑：指导视频互动以发展解释能力和智力谦虚。
6. A Semantics-Assisted Video Captioning Model Trained With Scheduled Sampling [O] . Haoran Chen, Ke Lin, Alexander Maye, 2020

机译：具有预定采样的语义辅助视频标题模型
7. Spatio-Temporal Dynamics and Semantic Attribute Enriched Visual Encoding for Video Captioning [O] . Nayyer Aafaq, Naveed Akhtar, Wei Liu, 2019

机译：用于视频字幕的时空动态和语义属性丰富的视觉编码

Guiding the Flowing of Semantics: Interpretable Video Captioning via POS Tag

摘要

著录项

相似文献

相关主题

期刊订阅