首页> 外文学位 >Novel Algorithms for Human Action Recognition in Videos
【24h】

Novel Algorithms for Human Action Recognition in Videos

机译:视频中人类动作识别的新算法

获取原文
获取原文并翻译 | 示例

摘要

Human action recognition from videos plays a crucial role in applications such as video annotation and retrieval, intelligent surveillance, sports video analysis, human-computer interactions, etc. It is a challenging problem in computer vision due to the highly-variable nature of human actions. In addition, variations in scale, illumination, viewpoint and background in the video make the problem even more challenging. In this thesis, we propose new algorithms for human action recognition based on developing novel video features and descriptors that are effective in discriminating complex actions. We also propose a new learning method for cross-dataset human action recognition.;In Chapter 2, we propose a new motion feature called difference HOG (dHOG) and, based on it, we develop a new video descriptor by encoding the pairwise spatial co-occurrences of motion cells and their spatial displacements within individual frames by using a dictionary. Temporal co-occurrence matrices are then used to capture the temporal co-occurrences of code words and the final descriptor is built by concatenating the Bag-of-Words (BoW) representation of the code words and the PCA-reduced temporal co-occurrence matrices.;In Chapter 3, we build video descriptors by dividing an action video into temporal segments and then extract low-level features (HOG, HOF and MBH) from individual segments. An ensemble of many-to-one encoders is then used to learn generalized high-level features from individual segments. We introduce two new algorithms to perform unsupervised segmentation of a video into temporal segments that correspond to sub-actions in an action video. The first is by performing K-means clustering of the low-level features, then followed by iterative adjustments of segment boundaries. The second algorithm uses Adaptive Affinity Propagation to perform clustering of the low-level features. Dynamic Time Warping is then used to iteratively merge segments to produce a hierarchical tree representation for the action video.;In Chapter 4, we tackle the problem of cross-dataset action recognition by making use of the knowledge of a known dataset to aid in the training and classification of a new dataset that is not fully annotated. The main challenge in cross-dataset action recognition is the huge intra-class variance introduced by different video sources. We propose a transfer-learning method based on a dual many-to-one encoder framework that trains one encoder on the source dataset and the second on the target dataset in parallel. The trained encoders map features from the two datasets to a generalized feature space, thus enabling the transfer of knowledge between the two datasets. During training, the generalized features extracted from the source dataset augments the training set of the insufficiently annotated target dataset.;We applied our algorithms to several challenging benchmark datasets to demonstrate their effectiveness. Our proposed algorithms outperformed many state-of-the-art methods in terms of recognition accuracy, most notably beating the state-of-the art result on the challenging HMDB51 dataset by over 20% when the second segmentation- based method in Chapter 3 is used.
机译:视频中的人类动作识别在诸如视频注释和检索,智能监控,体育视频分析,人机交互等应用中起着至关重要的作用。由于人类动作的高度可变性,这在计算机视觉中是一个具有挑战性的问题。此外,视频中比例,照度,视点和背景的变化也使问题更具挑战性。在本文中,我们提出了一种新的用于人类动作识别的算法,该算法基于开发新颖的视频特征和描述符来有效区分复杂的动作。我们还提出了一种用于跨数据集人类动作识别的新学习方法。在第二章中,我们提出了一种新的运动特征,称为差分HOG(dHOG),并在此基础上通过对成对的空间co -通过使用字典在单个帧中出现运动单元及其空间位移。然后,使用时间共现矩阵来捕获代码字的时间共现,并通过将代码字的词袋(BoW)表示与PCA缩减的时间共现矩阵进行级联来构建最终描述符在第三章中,我们通过将动作视频划分为时间段来构建视频描述符,然后从各个段中提取低级特征(HOG,HOF和MBH)。然后使用一组多对一的编码器从各个段中学习广义的高级功能。我们引入了两种新算法来将视频无监督地分割为与动作视频中的子动作相对应的时间段。首先是对低级特征执行K-means聚类,然后对段边界进行迭代调整。第二种算法使用自适应亲和传播对低级特征进行聚类。然后,使用动态时间规整来迭代合并片段,以生成动作视频的层次树表示形式。在第4章中,我们通过利用已知数据集的知识来帮助解决跨数据集动作识别的问题。尚未完全注释的新数据集的训练和分类。跨数据集动作识别的主要挑战是由不同视频源引入的巨大的类内差异。我们提出了一种基于双重多对一编码器框架的转移学习方法,该框架在源数据集上训练一个编码器,在目标数据集上并行训练一个编码器。训练有素的编码器将特征从两个数据集映射到广义特征空间,从而实现知识在两个数据集之间的传递。在训练过程中,从源数据集提取的广义特征增强了注释不充分的目标数据集的训练集。我们将算法应用于一些具有挑战性的基准数据集,以证明其有效性。在识别准确性方面,我们提出的算法优于许多最新方法,最显着的是,当第3章中的第二种基于分割的方法在HMDB51数据集上的最新结果超过20%时,用过的。

著录项

  • 作者

    Xu, Tiantian.;

  • 作者单位

    New York University Tandon School of Engineering.;

  • 授予单位 New York University Tandon School of Engineering.;
  • 学科 Computer science.
  • 学位 Ph.D.
  • 年度 2017
  • 页码 115 p.
  • 总页数 115
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号