首页> 外文期刊>ACM transactions on multimedia computing communications and applications >Adaptive Attention-based High-level Semantic Introduction for Image Caption
【24h】

Adaptive Attention-based High-level Semantic Introduction for Image Caption

机译:基于自适应的图像标题的高级语义介绍

获取原文
获取原文并翻译 | 示例

摘要

There have been several attempts to integrate a spatial visual attention mechanism into an image caption model and introduce semantic concepts as the guidance of image caption generation. High-level semantic information consists of the abstractedness and generality indication of an image, which is beneficial to improve the model performance. However, the high-level information is always static representation without considering the salient elements. Therefore, a semantic attention mechanism is used for the high-level information instead of conventional of static representation in this article. The salient high-level semantic information can be considered as redundant semantic information for image caption generation. Additionally, the generation of visual words and non-visual words can be separated, and an adaptive attention mechanism is employed to realize the guidance information of image caption generation switching between new fusion information (fusion of image feature and high-level semantics) and a language model. Therefore, visual words can be generated according to the image features and high-level semantic information, and non-visual words can be predicted by the language model. The semantics attention, adaptive attention, and previous generated words are fused to construct a special attention module for the input and output of long short-term memory. An image caption can be generated as a concise sentence on the basis of accurately grasping the rich content of the image. The experimental results show that the performance of the proposed model is promising for the evaluation metrics, and the captions can achieve logical and rich descriptions.
机译:已经有几次尝试将空间视觉注意机制集成到图像字幕模型中,并将语义概念引入图像标题生成的指导。高级语义信息由图像的抽象性和一般性指示组成,这是有利于提高模型性能的。但是,在不考虑突出元件的情况下,高级信息始终是静态表示。因此,语义关注机制用于高级信息而不是本文中的静态表示传统。突出的高级语义信息可以被认为是图像标题生成的冗余语义信息。另外,可以分离视觉单词和非视觉词的产生,并且采用自适应注意机制来实现新融合信息(图像特征和高级语义的融合)之间的图像字幕生成切换的引导信息和一个语言模型。因此,可以根据图像特征和高电平语义信息生成视觉词语,并且可以通过语言模型预测非视觉词。语义注意,自适应关注和先前生成的单词融合以构建一个特别的注意模块,用于长短短期内存的输入和输出。可以基于精确地抓取图像的丰富内容来生成图像标题作为简明句子。实验结果表明,该模型的性能对评估指标有望,标题可以实现逻辑和丰富的描述。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号