TS-LSTM and temporal-inception: Exploiting spatiotemporal dynamics for activity recognition

Ma Chih-Yao; Chen Min-Hung; Kira Zsolt; AlRegib Ghassan

首页> 外文期刊>Signal Processing. Image Communication: A Publication of the the European Association for Signal Processing >TS-LSTM and temporal-inception: Exploiting spatiotemporal dynamics for activity recognition

【24h】

TS-LSTM and temporal-inception: Exploiting spatiotemporal dynamics for activity recognition

机译：TS-LSTM和时间 - 初始化：利用时空动态进行活动识别

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recent two-stream deep Convolutional Neural Networks (ConvNets) have made significant progress in recognizing human actions in videos. Despite their success, methods extending the basic two-stream ConvNet have not systematically explored possible network architectures to further exploit spatiotemporal dynamics within video sequences. Further, such networks often use different baseline two-stream networks. Therefore, the differences and the distinguishing factors between various methods using Recurrent Neural Networks (RNN) or Convolutional Neural Networks on temporally-constructed feature vectors (Temporal-ConvNets) are unclear. In this work, we would like to answer the question: given the spatial and motion feature representations over time, what is the best way to exploit the temporal information? Toward this end, we first demonstrate a strong baseline two-stream ConvNet using ResNet-101. We use this baseline to thoroughly examine the use of both RNNs and Temporal-ConvNets for extracting spatiotemporal information. Building upon our experimental results, we then propose and investigate two different networks to further integrate spatiotemporal information: (1) Temporal Segment RNN and (2) Inception-style Temporal-ConvNet. We demonstrate that using both RNNs (with LSTMs) and Temporal-ConvNets on spatiotemporal feature matrices are able to exploit spatiotemporal dynamics to improve the overall performance. Our analysis identifies specific limitations for each method that could form the basis of future work. Our experimental results on UCF101 and HMDB51 datasets achieve comparable stateof-the-art performances, 94.1% and 69.0%, respectively, without requiring extensive temporal augmentation or end-to-end training.

机译：最近的两流深卷积神经网络（Convnetets）在识别视频中的人类行为方面取得了重大进展。尽管他们取得了成功，但延伸了基本两条流的方法尚未系统地探索可能的网络架构，以进一步利用视频序列内的时空动态。此外，这种网络通常使用不同的基线两流网络。因此，使用经常性神经网络（RNN）或卷积神经网络在时间构造的特征向量（时间沟程）之间的各种方法之间的差异和区别因素尚不清楚。在这项工作中，我们想回答这个问题：给定空间和运动功能表示随着时间的推移，利用时间信息的最佳方式是什么？朝向此结束，我们首先使用Reset-101展示一个强大的基线二流ConverNet。我们使用本基线彻底检查用于提取时空信息的RNN和时间脉冲阵列的使用。建立我们的实验结果，我们提出并调查了两种不同的网络，进一步整合了时空信息：（1）时间段RNN和（2）Incepion-Sique-Convnet。我们证明，使用RNNS（带LSTMS）和时空特征矩阵上的时间扫描能够利用时空动力学来提高整体性能。我们的分析确定了可以构成未来工作基础的每种方法的具体限制。我们对UCF101和HMDB51数据集的实验结果分别实现了可比的国家 - 最新的表演，分别为94.1％和69.0％，而无需广泛的时间增强或最终培训。

著录项

来源
《Signal Processing. Image Communication: A Publication of the the European Association for Signal Processing 》 |2019年第2019期| 共12页
作者
Ma Chih-Yao; Chen Min-Hung; Kira Zsolt; AlRegib Ghassan;
展开▼
作者单位

Georgia Inst Technol Sch Elect &

Comp Engn Atlanta GA 30332 USA;

Georgia Inst Technol Sch Elect &

Comp Engn Atlanta GA 30332 USA;

Georgia Tech Res Inst Atlanta GA 30332 USA;

Georgia Inst Technol Sch Elect &

Comp Engn Atlanta GA 30332 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类图像通信、多媒体通信 ; 通信 ;
关键词
Video understanding; Action recognition; Convolutional neural network; Recurrent neural network;

机译：视频理解;动作识别;卷积神经网络;经常性神经网络;

相似文献

外文文献
中文文献
专利

1. TS-LSTM and temporal-inception: Exploiting spatiotemporal dynamics for activity recognition [J] . Ma Chih-Yao, Chen Min-Hung, Kira Zsolt, Signal Processing. Image Communication: A Publication of the the European Association for Signal Processing . 2019 ,第期

机译：TS-LSTM和时间 - 初始化：利用时空动态进行活动识别
2. Predictable spatiotemporal dynamics of a dense cuttlefish spawning aggregation increases its vulnerability to exploitation [J] . Hall Karina C., Fowler Anthony J., Geddes Michael C., ICES Journal of Marine Science . 2018 ,第1期

机译：致密的墨鱼产卵聚集体的可预测时空动态增加了其易开发性
3. Balancing Prediction and Sensory Input in Speech Comprehension: The Spatiotemporal Dynamics of Word Recognition in Context [J] . Klimovich-Gray Anastasia, Tyler Lorraine K., Randall Billi, The Journal of Neuroscience: The Official Journal of the Society for Neuroscience . 2019 ,第3期

机译：言语理解中的平衡预测和感官输入：语境中字识别的时空动态
4. Exploiting Spatiotemporal Information for Blind Atrial Activity Extraction in Atrial Arrhythmias [C] . Francisco Castells, Jorge Igual, Vicente Zarzoso, Independent Component Analysis and Blind Signal Separation; Lecture Notes in Computer Science; 3195 . 2004

机译：利用时空信息进行房性心律失常的盲房活动提取。
5. Leveraging structure in activity recognition: Context and spatiotemporal dynamics [D] . Khamis, Sameh 2015

机译：活动识别中的杠杆结构：情境和时空动态
6. Balancing Prediction and Sensory Input in Speech Comprehension: The Spatiotemporal Dynamics of Word Recognition in Context [O] . Anastasia Klimovich-Gray, Lorraine K. Tyler, Billi Randall, 2019

机译：语音理解中的预测与感官输入之间的平衡：上下文中单词识别的时空动态
7. TS-LSTM and Temporal-Inception: Exploiting Spatiotemporal Dynamics for Activity Recognition [O] . Ma, Chih-Yao, Chen, Min-Hung, Kira, Zsolt, 2017

机译：Ts-LsTm和时间启动：利用时空动力学活动识别

TS-LSTM and temporal-inception: Exploiting spatiotemporal dynamics for activity recognition

摘要

著录项

相似文献

相关主题

期刊订阅