首页> 外文期刊>Neurocomputing >Spatial attention based visual semantic learning for action recognition in still images
【24h】

Spatial attention based visual semantic learning for action recognition in still images

机译:基于空间的视觉语义学习在静止图像中的动作识别

获取原文
获取原文并翻译 | 示例
       

摘要

Visual semantic parts play crucial roles in still image-based action recognition. A majority of existing methods require additional manual annotations such as human bounding boxes and predefined body parts besides action labels to learn action related visual semantic parts. However, labeling these manual annotations is rather time-consuming and labor-intensive. Moreover, not all manual annotations are effective when recognizing a specific action. Some of them can be irrelevant and even misguided. To address these limitations, this paper proposes a multi-stage deep learning method called Spatial Attention based Action Mask Networks (SAAM-Nets). The proposed method does not need any additional annotations besides action labels to obtain action-specific visual semantic parts. Instead, we propose a spatial attention layer injected in a convolutional neural network to create a specific action mask for each image with only action labels. Moreover, based on the action mask, we propose a region selection strategy to generate a semantic bounding box containing action-specific semantic parts. Furthermore, to effectively combine the information of the whole scene and the sematic box, two feature attention layers are adopted to obtain more discriminative representations. Experiments on four benchmark datasets have demonstrated that the proposed method can achieve promising performance compared with state-of-the-art methods. (C) 2020 Elsevier B.V. All rights reserved.
机译:视觉语义零件在基于图像的动作识别中起着至关重要的角色。除了动作标签之外,大多数现有方法需要额外的手动注释,例如人类边界盒和预定义的身体部位,以学习动作相关的视觉语义部件。但是,标记这些手动注释是相当耗时和劳动密集的。此外,并非所有手动注释在识别特定行动时都是有效的。其中一些可能是无关紧要的甚至误导。为了解决这些限制,本文提出了一种称为空间关注的多阶段深度学习方法,基于动作掩码网络(Saam-网)。除了动作标签之外,所提出的方法不需要任何额外的注释,以获得特定于动作特定的视觉语义部分。相反,我们提出了一种在卷积神经网络中注入的空间注意层,为每个图像创建特定动作掩模,只有动作标签。此外,基于动作掩码,我们提出了一个区域选择策略来生成包含特定于动作的语义部分的语义边界框。此外,为了有效地结合整个场景和半盒的信息,采用了两个特征注意层来获得更辨别的表示。四个基准数据集的实验表明,与最先进的方法相比,该方法可以实现有希望的性能。 (c)2020 Elsevier B.v.保留所有权利。

著录项

  • 来源
    《Neurocomputing》 |2020年第6期|383-396|共14页
  • 作者单位

    Chinese Acad Sci Xian Inst Opt & Precis Mech Key Lab Spectral Imaging Technol CAS Xian 710119 Shaanxi Peoples R China|Univ Chinese Acad Sci Beijing 100049 Peoples R China;

    Chinese Acad Sci Xian Inst Opt & Precis Mech Key Lab Spectral Imaging Technol CAS Xian 710119 Shaanxi Peoples R China;

    Chinese Acad Sci Xian Inst Opt & Precis Mech Key Lab Spectral Imaging Technol CAS Xian 710119 Shaanxi Peoples R China;

    Chinese Acad Sci Xian Inst Opt & Precis Mech Key Lab Spectral Imaging Technol CAS Xian 710119 Shaanxi Peoples R China;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Still image-based action recognition; Spatial attention; Semantic parts; Deep learning;

    机译:基于图像的动作识别;空间关注;语义零件;深入学习;
  • 入库时间 2022-08-18 22:26:49

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号