首页> 外文期刊>Signal Processing Letters, IEEE >Sequential Segment Networks for Action Recognition
【24h】

Sequential Segment Networks for Action Recognition

机译:顺序段网络的动作识别

获取原文
获取原文并翻译 | 示例

摘要

Recently, deep convolutional networks (ConvNets) have achieved remarkable progress for action recognition in videos. Most existing deep frameworks treat a video as an unordered frame sequence, and make a prediction by averaging the output of single RGB image, or stacked optical flow field. However, within a video, complex actions may consist of several atomic actions carried out in a sequential manner during its temporal range. To address this issue, we propose a deep learning framework, sequential segment networks (SSN), to model video-level temporal structures in videos. We get several short video snippets by a sparse temporal sampling strategy, and then concatenate the output of ConvNets learned from short snippets; finally, the concatenated consensus vector is fed into a fully connected layer to learn its temporal structure. The sparse sampling strategy and video-level structure enable an efficient and effective training process for SSNs. Extensive empirical studies demonstrate that action recognition performance can be significantly improved by mining temporal structures, and our approach achieves state-of-the-art performance on UCF101 and HMDB51 datasets.
机译:最近,深度卷积网络(ConvNets)在视频中的动作识别方面取得了显着进步。大多数现有的深层框架将视频视为无序帧序列,并通过平均单个RGB图像或堆叠的光流场的输出进行预测。但是,在视频中,复杂的动作可能包含在其时间范围内以顺序方式执行的几个原子动作。为了解决此问题,我们提出了一个深度学习框架,即顺序段网络(SSN),以对视频中的视频级时态结构进行建模。我们通过稀疏的时间采样策略获得了几个短视频片段,然后将从短片段中学到的ConvNets的输出连接起来。最后,级联的共识向量被馈送到完全连接的层中以学习其时间结构。稀疏的采样策略和视频级别的结构使SSN的培训过程更加有效。大量的经验研究表明,通过挖掘时间结构可以显着提高动作识别性能,并且我们的方法在UCF101和HMDB51数据集上实现了最先进的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号