Spatial attention based visual semantic learning for action recognition in still images

Zheng Yunpeng; Zheng Xiangtao; Lu Xiaoqiang; Wu Siyuan

首页> 外文期刊>Neurocomputing >Spatial attention based visual semantic learning for action recognition in still images

【24h】

Spatial attention based visual semantic learning for action recognition in still images

机译：基于空间的视觉语义学习在静止图像中的动作识别

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Visual semantic parts play crucial roles in still image-based action recognition. A majority of existing methods require additional manual annotations such as human bounding boxes and predefined body parts besides action labels to learn action related visual semantic parts. However, labeling these manual annotations is rather time-consuming and labor-intensive. Moreover, not all manual annotations are effective when recognizing a specific action. Some of them can be irrelevant and even misguided. To address these limitations, this paper proposes a multi-stage deep learning method called Spatial Attention based Action Mask Networks (SAAM-Nets). The proposed method does not need any additional annotations besides action labels to obtain action-specific visual semantic parts. Instead, we propose a spatial attention layer injected in a convolutional neural network to create a specific action mask for each image with only action labels. Moreover, based on the action mask, we propose a region selection strategy to generate a semantic bounding box containing action-specific semantic parts. Furthermore, to effectively combine the information of the whole scene and the sematic box, two feature attention layers are adopted to obtain more discriminative representations. Experiments on four benchmark datasets have demonstrated that the proposed method can achieve promising performance compared with state-of-the-art methods. (C) 2020 Elsevier B.V. All rights reserved.

机译：视觉语义零件在基于图像的动作识别中起着至关重要的角色。除了动作标签之外，大多数现有方法需要额外的手动注释，例如人类边界盒和预定义的身体部位，以学习动作相关的视觉语义部件。但是，标记这些手动注释是相当耗时和劳动密集的。此外，并非所有手动注释在识别特定行动时都是有效的。其中一些可能是无关紧要的甚至误导。为了解决这些限制，本文提出了一种称为空间关注的多阶段深度学习方法，基于动作掩码网络（Saam-网）。除了动作标签之外，所提出的方法不需要任何额外的注释，以获得特定于动作特定的视觉语义部分。相反，我们提出了一种在卷积神经网络中注入的空间注意层，为每个图像创建特定动作掩模，只有动作标签。此外，基于动作掩码，我们提出了一个区域选择策略来生成包含特定于动作的语义部分的语义边界框。此外，为了有效地结合整个场景和半盒的信息，采用了两个特征注意层来获得更辨别的表示。四个基准数据集的实验表明，与最先进的方法相比，该方法可以实现有希望的性能。（c）2020 Elsevier B.v.保留所有权利。

著录项

来源
《Neurocomputing》 |2020年第6期|383-396|共14页
作者
Zheng Yunpeng; Zheng Xiangtao; Lu Xiaoqiang; Wu Siyuan;
展开▼
作者单位

Chinese Acad Sci Xian Inst Opt & Precis Mech Key Lab Spectral Imaging Technol CAS Xian 710119 Shaanxi Peoples R China|Univ Chinese Acad Sci Beijing 100049 Peoples R China;

Chinese Acad Sci Xian Inst Opt & Precis Mech Key Lab Spectral Imaging Technol CAS Xian 710119 Shaanxi Peoples R China;

Chinese Acad Sci Xian Inst Opt & Precis Mech Key Lab Spectral Imaging Technol CAS Xian 710119 Shaanxi Peoples R China;

Chinese Acad Sci Xian Inst Opt & Precis Mech Key Lab Spectral Imaging Technol CAS Xian 710119 Shaanxi Peoples R China;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Still image-based action recognition; Spatial attention; Semantic parts; Deep learning;

机译：基于图像的动作识别;空间关注;语义零件;深入学习;

相似文献

外文文献
中文文献
专利

1. Interactive activation in visual word recognition: Constraints imposed by the joint effects of spatial attention and semantics [J] . Stolz JA, Stevanovski B Journal of experimental psychology. human perception and performance . 2004,第6期

机译：视觉单词识别中的交互式激活：空间注意和语义的共同作用所施加的约束
2. Interactive activation in visual word recognition: Constraints imposed by the joint effects of spatial attention and semantics [J] . Stolz JA, Stevanovski B Journal of experimental psychology. human perception and performance . 2004,第6期

机译：视觉单词识别中的交互式激活：空间注意和语义的共同作用所施加的约束
3. Subset selection for visualization of relevant image fractions for deep learning based semantic image segmentation [J] . Mauch Lukas, Wang Chunlai, Yang Bin Journal of the Franklin Institute . 2018,第4期

机译：用于可视化相关图像部分的子集选择，用于基于深度学习的语义图像分割
4. Simultaneous learning of spatial visual attention and physical actions [C] . IEEE/RSJ 2010 International Conference on Intelligent Robots and Systems . 2010

机译：同时学习空间视觉注意力和身体动作
5. View-invariant object category learning, recognition, and search: How spatial and object attention are coordinated using surface-based attentional shrouds. [D] . Fazl, Arash. 2007

机译：视图不变的对象类别的学习，识别和搜索：如何使用基于表面的注意遮罩来协调空间和对象的注意。
6. Learning Soft Mask Based Feature Fusion with Channel and Spatial Attention for Robust Visual Object Tracking [O] . Mustansar Fiaz, Arif Mahmood, Soon Ki Jung 2020

机译：基于软掩模的基于软件融合具有频道和空间注意力可用于鲁棒性视觉对象跟踪
7. View-Invariant Object Category Learning, Recognition, and Search: How Spatial and Object Attention Are Coordinated Using Surface-Based Attentional Shrouds [O] . Fazl, Arash, Grossberg, Stephen, Mingolla, Ennio 2007

机译：视图不变的对象类别的学习，识别和搜索：如何使用基于表面的注意覆盖物协调空间和对象注意

Spatial attention based visual semantic learning for action recognition in still images

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅