Temporal Segment Networks for Action Recognition in Videos

Wang Limin; Xiong Yuanjun; Wang Zhe; Qiao Yu; Lin Dahua; Tang Xiaoou; Van Gool Luc

首页> 外文期刊>IEEE Transactions on Pattern Analysis and Machine Intelligence >Temporal Segment Networks for Action Recognition in Videos

【24h】

Temporal Segment Networks for Action Recognition in Videos

机译：用于视频中动作识别的时间段网络

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

We present a general and flexible video-level framework for learning action models in videos. This method, called temporal segment network (TSN), aims to model long-range temporal structure with a new segment-based sampling and aggregation scheme. This unique design enables the TSN framework to efficiently learn action models by using the whole video. The learned models could be easily deployed for action recognition in both trimmed and untrimmed videos with simple average pooling and multi-scale temporal window integration, respectively. We also study a series of good practices for the implementation of the TSN framework given limited training samples. Our approach obtains the state-the-of-art performance on five challenging action recognition benchmarks: HMDB51 (71.0 percent), UCF101 (94.9 percent), THUMOS14 (80.1 percent), ActivityNet v1.2 (89.6 percent), and Kinetics400 (75.7 percent). In addition, using the proposed RGB difference as a simple motion representation, our method can still achieve competitive accuracy on UCF101 (91.0 percent) while running at 340 FPS. Furthermore, based on the proposed TSN framework, we won the video classification track at the ActivityNet challenge 2016 among 24 teams.

机译：我们提供了一个通用且灵活的视频级框架，用于学习视频中的动作模型。这种称为时间分段网络（TSN）的方法旨在使用一种新的基于分段的采样和聚合方案对远程时间结构进行建模。这种独特的设计使TSN框架可以通过使用整个视频来有效地学习动作模型。通过简单的平均池化和多尺度时间窗口集成，可以轻松地将学习到的模型轻松地用于修剪和未修剪视频中的动作识别。在有限的培训样本的情况下，我们还研究了一系列实施TSN框架的良好做法。我们的方法通过五个具有挑战性的动作识别基准获得了最先进的性能：HMDB51（71.0％），UCF101（94.9％），THUMOS14（80.1％），ActivityNet v1.2（89.6％）和Kinetics400（75.7）百分）。此外，使用建议的RGB差异作为简单的运动表示，当以340 FPS运行时，我们的方法仍可以在UCF101上达到竞争性精度（91.0％）。此外，基于建议的TSN框架，我们在24个团队的ActivityNet挑战赛2016中赢得了视频分类轨道。

著录项

来源
《IEEE Transactions on Pattern Analysis and Machine Intelligence》 |2019年第11期|2740-2755|共16页
作者
Wang Limin; Xiong Yuanjun; Wang Zhe; Qiao Yu; Lin Dahua; Tang Xiaoou; Van Gool Luc;
展开▼
作者单位

Nanjing Univ State Key Lab Novel Software Technol Nanjing 210023 Jiangsu Peoples R China;

Amazon Web Serv Seattle WA 98101 USA;

Univ Calif Irvine Dept Comp Sci Irvine CA 92697 USA;

Chinese Acad Sci Shenzhen Inst Adv Technol Shenzhen 518055 Peoples R China;

Chinese Univ Hong Kong Dept Informat Engn Shatin Hong Kong Peoples R China;

Swiss Fed Inst Technol Comp Vis Lab CH-8092 Zurich Switzerland;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Action recognition; temporal segment networks; temporal modeling; good practices; ConvNets;

机译：动作识别;时间段网络;时间建模;良好做法;卷积网;

相似文献

外文文献
中文文献
专利

1. Unified Spatio-Temporal Attention Networks for Action Recognition in Videos [J] . Li Dong, Yao Ting, Duan Ling-Yu, IEEE transactions on multimedia . 2019,第2期

机译：统一的时空注意力网络，用于视频中的动作识别
2. Attention-based spatial–temporal hierarchical ConvLSTM network for action recognition in videos [J] . Computer Vision, IET . 2019,第8期

机译：基于注意力的时空分层ConvLSTM网络，用于视频中的动作识别
3. A spatio-temporal recurrent network for salmon feeding action recognition from underwater videos in aquaculture [J] . Maloy Hakon, Aamodt Agnar, Misimi Ekrem Computers and Electronics in Agriculture . 2019,第期

机译：水产养殖中水下视频的三文鱼饲养行动识别时空复发网络
4. Action Recognition in Videos with Temporal Segments Fusions [C] . Yuanye Fang, Rui Zhang, Qiu-Feng Wang, International Conference on Brain Inspired Cognitive Systems . 2019

机译：具有时间段融合的视频中的动作识别
5. Object Recognition in Videos Utilizing Hierarchical and Temporal Objectness with Deep Neural Networks. [D] . Peng, Liang. 2017

机译：利用具有深度神经网络的分层和时间对象性的视频中的对象识别。
6. Segment-Tube: Spatio-Temporal Action Localization in Untrimmed Videos with Per-Frame Segmentation [O] . Le Wang, Xuhuan Duan, Qilin Zhang, 2018

机译：Segment-Tube：具有按帧分割的未修剪视频中的时空行为本地化
7. Temporal Segment Networks for Action Recognition in Videos [O] . Wang, Limin, Xiong, Yuanjun, Wang, Zhe, 2017

机译：视频中动作识别的时间段网络
8. Human Action Recognition in Surveillance Videos using Abductive Reasoning on Linear Temporal Logic. [R] . Basu, S., Stagg, M., DiBiano, R., 2012

机译：利用线性时态逻辑的诱导推理对监控视频中的人体行为识别。

Temporal Segment Networks for Action Recognition in Videos

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅