首页> 外文期刊>International Journal of Computer Vision >A Robust and Efficient Video Representation for Action Recognition
【24h】

A Robust and Efficient Video Representation for Action Recognition

机译:用于动作识别的鲁棒高效的视频表示

获取原文
获取原文并翻译 | 示例
           

摘要

This paper introduces a state-of-the-art video representation and applies it to efficient action recognition and detection. We first propose to improve the popular dense trajectory features by explicit camera motion estimation. More specifically, we extract feature point matches between frames using SURF descriptors and dense optical flow. The matches are used to estimate a homography with RANSAC. To improve the robustness of homography estimation, a human detector is employed to remove outlier matches from the human body as human motion is not constrained by the camera. Trajectories consistent with the homography are considered as due to camera motion, and thus removed. We also use the homography to cancel out camera motion from the optical flow. This results in significant improvement on motion-based HOF and MBH descriptors. We further explore the recent Fisher vector as an alternative feature encoding approach to the standard bag-of-words (BOW) histogram, and consider different ways to include spatial layout information in these encodings. We present a large and varied set of evaluations, considering (i) classification of short basic actions on six datasets, (ii) localization of such actions in feature-length movies, and (iii) large-scale recognition of complex events. We find that our improved trajectory features significantly outperform previous dense trajectories, and that Fisher vectors are superior to BOW encodings for video recognition tasks. In all three tasks, we show substantial improvements over the state-of-the-art results.
机译:本文介绍了最新的视频表示并将其应用于有效的动作识别和检测。我们首先提出通过显式相机运动估计来改善流行的密集轨迹特征。更具体地说,我们使用SURF描述符和密集光流提取帧之间的特征点匹配。匹配用于估计具有RANSAC的单应性。为了提高单应性估计的鲁棒性,由于照相机不限制人体运动,因此采用人体检测器从人体上去除异常匹配。与单应性一致的轨迹被认为是由于相机运动造成的,因此已被删除。我们还使用单应性抵消光流中的相机运动。这导致基于运动的HOF和MBH描述符的显着改进。我们进一步探索最近的Fisher向量,作为标准词袋(BOW)直方图的替代特征编码方法,并考虑在这些编码中包括空间布局信息的不同方法。考虑到(i)对六个数据集的简短基本动作的分类,(ii)在长篇电影中对此类动作的定位以及(iii)对复杂事件的大规模识别,我们提出了一组各种各样的评估方法。我们发现,我们改进的轨迹功能明显优于以前的密集轨迹,并且Fisher视频在视频识别任务方面优于BOW编码。在所有这三个任务中,我们都显示出对最新结果的实质性改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号