...
首页> 外文期刊>Signal Processing. Image Communication: A Publication of the the European Association for Signal Processing >TS-LSTM and temporal-inception: Exploiting spatiotemporal dynamics for activity recognition
【24h】

TS-LSTM and temporal-inception: Exploiting spatiotemporal dynamics for activity recognition

机译:TS-LSTM和时间 - 初始化:利用时空动态进行活动识别

获取原文
获取原文并翻译 | 示例

摘要

Recent two-stream deep Convolutional Neural Networks (ConvNets) have made significant progress in recognizing human actions in videos. Despite their success, methods extending the basic two-stream ConvNet have not systematically explored possible network architectures to further exploit spatiotemporal dynamics within video sequences. Further, such networks often use different baseline two-stream networks. Therefore, the differences and the distinguishing factors between various methods using Recurrent Neural Networks (RNN) or Convolutional Neural Networks on temporally-constructed feature vectors (Temporal-ConvNets) are unclear. In this work, we would like to answer the question: given the spatial and motion feature representations over time, what is the best way to exploit the temporal information? Toward this end, we first demonstrate a strong baseline two-stream ConvNet using ResNet-101. We use this baseline to thoroughly examine the use of both RNNs and Temporal-ConvNets for extracting spatiotemporal information. Building upon our experimental results, we then propose and investigate two different networks to further integrate spatiotemporal information: (1) Temporal Segment RNN and (2) Inception-style Temporal-ConvNet. We demonstrate that using both RNNs (with LSTMs) and Temporal-ConvNets on spatiotemporal feature matrices are able to exploit spatiotemporal dynamics to improve the overall performance. Our analysis identifies specific limitations for each method that could form the basis of future work. Our experimental results on UCF101 and HMDB51 datasets achieve comparable stateof-the-art performances, 94.1% and 69.0%, respectively, without requiring extensive temporal augmentation or end-to-end training.
机译:最近的两流深卷积神经网络(Convnetets)在识别视频中的人类行为方面取得了重大进展。尽管他们取得了成功,但延伸了基本两条流的方法尚未系统地探索可能的网络架构,以进一步利用视频序列内的时空动态。此外,这种网络通常使用不同的基线两流网络。因此,使用经常性神经网络(RNN)或卷积神经网络在时间构造的特征向量(时间沟程)之间的各种方法之间的差异和区别因素尚不清楚。在这项工作中,我们想回答这个问题:给定空间和运动功能表示随着时间的推移,利用时间信息的最佳方式是什么?朝向此结束,我们首先使用Reset-101展示一个强大的基线二流ConverNet。我们使用本基线彻底检查用于提取时空信息的RNN和时间脉冲阵列的使用。建立我们的实验结果,我们提出并调查了两种不同的网络,进一步整合了时空信息:(1)时间段RNN和(2)Incepion-Sique-Convnet。我们证明,使用RNNS(带LSTMS)和时空特征矩阵上的时间扫描能够利用时空动力学来提高整体性能。我们的分析确定了可以构成未来工作基础的每种方法的具体限制。我们对UCF101和HMDB51数据集的实验结果分别实现了可比的国家 - 最新的表演,分别为94.1%和69.0%,而无需广泛的时间增强或最终培训。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号