Guiding the Flowing of Semantics: Interpretable Video Captioning via POS Tag

机译：引导语义流：通过POS标签可解释的视频字幕

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In the current video captioning models, the video frames are collected in one network and the semantics are mixed into one feature, which not only increase the difficulty of the caption decoding, but also decrease the inter-pretability of the captioning models. To address these problems, we propose an Adaptive Semantic Guidance Network (ASGN), which instantiates the whole video semantics to different POS-aware semantics with the supervision of part of speech (POS) tag. In the encoding process, the POS tag activates the related neurons and parses the whole semantic information into corresponding encoded video representations. Furthermore, the potential of the model is stimulated by the POS-aware video features. In the decoding process, the related video features of noun and verb are used as the supervision to construct a new adaptive attention model which can decide whether to attend to the video feature or not. With the explicit improving of the interpretability of the network, the learning process is more transparent and the results are more predictable. Extensive experiments demonstrate the effectiveness of our model when compared with state-of-the-art models.

机译：在当前的视频字幕模型中，视频帧被收集在一个网络中并且语义被混合为一个特征，这不仅增加了字幕解码的难度，而且降低了字幕模型的可解释性。为了解决这些问题，我们提出了一种自适应语义指导网络（ASGN），该系统可在词性（POS）标签的监督下将整个视频语义实例化为不同的POS感知语义。在编码过程中，POS标签激活相关的神经元并将整个语义信息解析为相应的编码视频表示形式。此外，支持POS的视频功能激发了该模型的潜力。在解码过程中，以名词和动词的相关视频特征为监督，构建了新的自适应注意力模型，该模型可以决定是否参与视频特征。随着网络可解释性的显着提高，学习过程更加透明，结果更加可预测。与最新模型相比，大量实验证明了我们模型的有效性。

著录项

来源
《International joint conference on natural language processing;Conference on empirical methods in natural language processing》|2019年|2068-2077|共10页
会议地点
作者
Xinyu Xiao; Lingfeng Wang; Bin Fan; Shiming Xiang; Chunhong Pan;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Fused GRU with semantic-temporal attention for video captioning [J] . Gao Lianli, Wang Xuanhan, Song Jingkuan, Neurocomputing . 2020,第Juna28期

机译：融合GRU与视频标题的语义关注
2. High-Quality Image Captioning With Fine-Grained and Semantic-Guided Visual Attention [J] . Zhang Zongjian, Wu Qiang, Wang Yang, IEEE transactions on multimedia . 2019,第7期

机译：具有细粒度和语义引导的视觉注意的高质量图像字幕
3. Semantic-filtered Soft-Split-Aware video captioning with audio-augmented feature [J] . Xu Yuecong, Yang Jianfei, Mao Kezhi Neurocomputing . 2019,第SEPa10期

机译：具有音频增强功能的语义过滤的“软分割感知”视频字幕
4. Guiding the Flowing of Semantics: Interpretable Video Captioning via POS Tag [C] . Xinyu Xiao, Lingfeng Wang, Bin Fan, International joint conference on natural language processing . 2019

机译：指导语义流动：通过POS标签解释视频字幕
5. An enlightened eye and an inquiring mind: Guided video interactions to develop interpretive skills and intellectual modesty. [D] . Preston, Michael D. 2010

机译：开明的眼睛和好奇的头脑：指导视频互动以发展解释能力和智力谦虚。
6. A Semantics-Assisted Video Captioning Model Trained With Scheduled Sampling [O] . Haoran Chen, Ke Lin, Alexander Maye, 2020

机译：具有预定采样的语义辅助视频标题模型
7. Spatio-Temporal Dynamics and Semantic Attribute Enriched Visual Encoding for Video Captioning [O] . Nayyer Aafaq, Naveed Akhtar, Wei Liu, 2019

机译：用于视频字幕的时空动态和语义属性丰富的视觉编码

Guiding the Flowing of Semantics: Interpretable Video Captioning via POS Tag

摘要

著录项

相似文献

相关主题

期刊订阅