A Robust and Efficient Video Representation for Action Recognition

Wang Heng; Oneata Dan; Verbeek Jakob; Schmid Cordelia

首页> 外文期刊>International Journal of Computer Vision >A Robust and Efficient Video Representation for Action Recognition

【24h】

A Robust and Efficient Video Representation for Action Recognition

机译：用于动作识别的鲁棒高效的视频表示

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper introduces a state-of-the-art video representation and applies it to efficient action recognition and detection. We first propose to improve the popular dense trajectory features by explicit camera motion estimation. More specifically, we extract feature point matches between frames using SURF descriptors and dense optical flow. The matches are used to estimate a homography with RANSAC. To improve the robustness of homography estimation, a human detector is employed to remove outlier matches from the human body as human motion is not constrained by the camera. Trajectories consistent with the homography are considered as due to camera motion, and thus removed. We also use the homography to cancel out camera motion from the optical flow. This results in significant improvement on motion-based HOF and MBH descriptors. We further explore the recent Fisher vector as an alternative feature encoding approach to the standard bag-of-words (BOW) histogram, and consider different ways to include spatial layout information in these encodings. We present a large and varied set of evaluations, considering (i) classification of short basic actions on six datasets, (ii) localization of such actions in feature-length movies, and (iii) large-scale recognition of complex events. We find that our improved trajectory features significantly outperform previous dense trajectories, and that Fisher vectors are superior to BOW encodings for video recognition tasks. In all three tasks, we show substantial improvements over the state-of-the-art results.

机译：本文介绍了最新的视频表示并将其应用于有效的动作识别和检测。我们首先提出通过显式相机运动估计来改善流行的密集轨迹特征。更具体地说，我们使用SURF描述符和密集光流提取帧之间的特征点匹配。匹配用于估计具有RANSAC的单应性。为了提高单应性估计的鲁棒性，由于照相机不限制人体运动，因此采用人体检测器从人体上去除异常匹配。与单应性一致的轨迹被认为是由于相机运动造成的，因此已被删除。我们还使用单应性抵消光流中的相机运动。这导致基于运动的HOF和MBH描述符的显着改进。我们进一步探索最近的Fisher向量，作为标准词袋（BOW）直方图的替代特征编码方法，并考虑在这些编码中包括空间布局信息的不同方法。考虑到（i）对六个数据集的简短基本动作的分类，（ii）在长篇电影中对此类动作的定位以及（iii）对复杂事件的大规模识别，我们提出了一组各种各样的评估方法。我们发现，我们改进的轨迹功能明显优于以前的密集轨迹，并且Fisher视频在视频识别任务方面优于BOW编码。在所有这三个任务中，我们都显示出对最新结果的实质性改进。

著录项

来源
《International Journal of Computer Vision》 |2016年第3期|共20页
作者
Wang Heng; Oneata Dan; Verbeek Jakob; Schmid Cordelia;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
Action recognition; Action detection; Multimedia event detection;

机译：动作识别;动作检测;多媒体事件检测;

相似文献

外文文献
中文文献
专利

1. A Robust and Efficient Video Representation for Action Recognition [J] . Wang Heng, Oneata Dan, Verbeek Jakob, International Journal of Computer Vision . 2016,第3期

机译：用于动作识别的鲁棒高效的视频表示
2. Body Surface Context: A New Robust Feature for Action Recognition From Depth Videos [J] . Song Y., Tang J., Liu F., IEEE Transactions on Circuits and Systems for Video Technology . 2014,第6期

机译：体表语境：深度视频中动作识别的新功能
3. KeyFrame extraction based on face quality measurement and convolutional neural network for efficient face recognition in videos [J] . Abed Rahma, Bahroun Sahbi, Zagrouba Ezzeddine Multimedia Tools and Applications . 2021,第15期

机译：基于面部质量测量和卷积神经网络的关键帧提取，以便视频高效脸部识别
4. Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition [C] . Shuyang Sun, Zhanghui Kuang, Lu Sheng, IEEE/CVF Conference on Computer Vision and Pattern Recognition . 2018

机译：光流引导功能：用于视频动作识别的快速且鲁棒的运动表示
5. Robust representation and recognition of actions in video. [D] . Natarajan, Pradeep. 2009

机译：视频中动作的可靠表示和识别。
6. Marginalised Stacked Denoising Autoencoders for Robust Representation of Real-Time Multi-View Action Recognition [O] . Feng Gu, Francisco Flórez-Revuelta, Dorothy Monekosso, 2015

机译：边缘化堆叠式降噪自动编码器用于实时多视图动作识别的鲁棒表示
7. A robust and efficient video representation for action recognition [O] . Wang, Heng, Oneata, Dan, Verbeek, Jakob, 2016

机译：用于动作识别的强大而有效的视频表示

A Robust and Efficient Video Representation for Action Recognition

摘要

著录项

相似文献

相关主题

期刊订阅