首页> 外文期刊>IEEE Transactions on Geoscience and Remote Sensing >Sound Active Attention Framework for Remote Sensing Image Captioning
【24h】

Sound Active Attention Framework for Remote Sensing Image Captioning

机译:声音主动注意遥感图像标题框架

获取原文
获取原文并翻译 | 示例

摘要

Attention mechanism-based image captioning methods have achieved good results in the remote sensing field, but are driven by tagged sentences, which is called passive attention. However, different observers may give different levels of attention to the same image. The attention of observers during testing, then, may not be consistent with the attention during training. As a direct and natural human-machine interaction, speech is much faster than typing sentences. Sound can represent the attention of different observers. This is called active attention. Active attention can be more targeted to describe the image; for example, in disaster assessments, the situation can be obtained quickly and the corresponding disaster areas can be located related to the specific disaster. A novel sound active attention framework is proposed for more specific caption generation according to the interest of the observer. First, sound is modeled by mel-frequency cepstral coefficients (MFCCs) and the image is encoded by convolutional neural networks (CNNs). Then, to handle the continuity characteristic of sound, a sound module and an attention module are designed based on the gated recurrent units (GRUs). Finally, the sound-guided image feature processed by the attention module is imported into the output module to generate descriptive sentence. Experiments based on both fake and real sound data sets show that the proposed method can generate sentences that can capture the focus of human.
机译:注意力机制的图像标题方法在遥感场中取得了良好的结果,但是由标记的句子驱动,称为被动注意。然而,不同的观察者可以给出不同的关注水平对同一图像。检测期间观察者的注意力可能不会与培训期间的注意力一致。作为直接和天然的人机互动,语音比键入句子要快得多。声音可以代表不同观察者的注意力。这被称为主动注意力。主动注意力可以更具目标来描述图像;例如,在灾害评估中,可以快速获得情况,相应的灾区可以与特定灾难有关。根据观察者的兴趣,提出了一种新的声音主动注意力框架。首先,声音由熔融频率谱系数(MFCC)建模,并且图像由卷积神经网络(CNN)编码。然后,为了处理声音的连续性特性,基于门控复发单元(GRUS)设计了声音模块和注意模块。最后,由注意模块处理的声音引导图像功能导入到输出模块中以生成描述性句子。基于假和真实的声音数据集的实验表明,该方法可以生成可以捕获人类焦点的句子。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号