首页> 外文期刊>The Visual Computer >Fine-grained action recognition using multi-view attentions
【24h】

Fine-grained action recognition using multi-view attentions

机译:使用多视图注意的细粒度行动识别

获取原文
获取原文并翻译 | 示例
           

摘要

Inflated 3D ConvNet (I3D) utilizes 3D convolution to enrich semantic information of features, forming a strong baseline for human action recognition. However, 3D convolution extracts features by mixing spatial, temporal and cross-channel information together, lacking the ability to emphasize meaningful features along specific dimensions, especially for the cross-channel information, which is, however, of crucial importance in recognizing fine-grained actions. In this paper, we propose a novel multi-view attention mechanism, named channel-spatial-temporal attention (CSTA) block, to guide the network to pay more attention to the clues useful for fine-grained action recognition. Specifically, CSTA consists of three branches: channel-spatial branch, channel-temporal branch and spatial-temporal branch. By directly plugging these branches into I3D, we further explore the impact of location information as well as the number of blocks in terms of recognition accuracy. We also examine two different strategies for designing a mixture of multiple CSTA blocks. Extensive experiments demonstrate the effectiveness of our CSTA. Namely, while using only RGB frames to train the network, I3D equipped with CSTA (I3D-CSTA) achieves accuracies of 95.76% and 73.97% on UCF101 and HMDB51, respectively. These results are indeed comparable with the results produced by the methods using both RGB frames and optical flow. Even more, with the assistance of optical flow, the recognition accuracies of CSTA-I3D rise to 98.2% on UCF101 and 82.9% on HMDB51, outperforming many state-of-the-art methods.
机译:膨胀3D ConvNet(I3D)利用3D卷积来丰富特征的语义信息,形成人类行动识别的强大基线。然而,3D卷积通过将空间,时间和交叉信道信息混合在一起,缺乏沿着特定尺寸强调有意义特征的能力,特别是对于跨通道信息来说,这在识别细粒度时至关重要行动。在本文中,我们提出了一种新颖的多视图注意机制,命名为通道 - 空间 - 时间注意(CSTA)块,以引导网络更加关注用于细粒度动作识别的线索。具体而言,CSTA由三个分支组成:通道 - 空间分支,通道 - 时间分支和空间 - 时间分支。通过直接将这些分支插入I3D,我们进一步探索了位置信息的影响以及识别准确性方面的块数。我们还研究了两种不同的策略来设计多个CSTA块的混合物。广泛的实验表明了我们CSTA的有效性。即,在仅使用RGB帧进行培训网络时,配备CSTA(I3D-CSTA)的I3D分别在UCF101和HMDB51上实现了95.76%和73.97%的精度。这些结果与使用RGB帧和光学流程的方法产生的结果相当。在光流量的帮助下,甚至更多,CSTA-I3D的识别精度在HMDB51上升高到UCF101和82.9%的98.2%,优于许多最先进的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号