首页> 外文期刊>Neurocomputing >Dynamic texture and scene classification by transferring deep image features
【24h】

Dynamic texture and scene classification by transferring deep image features

机译:通过传递深层图像特征进行动态纹理和场景分类

获取原文
获取原文并翻译 | 示例

摘要

Dynamic texture and scene classification are two fundamental problems in understanding natural video content. Extracting robust and effective features is a crucial step towards solving these problems. However, the existing approaches suffer from the sensitivity to either varying illumination, or viewpoint changes, or even camera motion, and/or the lack of spatial information. Inspired by the success of deep structures in image classification, we attempt to leverage a deep structure to extract features for dynamic texture and scene classification. To tackle with the challenges in training a deep structure, we propose to transfer some prior knowledge from image domain to video domain. To be more specific, we propose to apply a well-trained Convolutional Neural Network (ConvNet) as a feature extractor to extract mid-level features from each frame, and then form the video-level representation by concatenating the first and the second order statistics over the mid-level features. We term this two-level feature extraction scheme as a Transferred ConvNet Feature (TCoF). Moreover, we explore two different implementations of the TCoF scheme, i.e., the spatial TCoF and the temporal TCoF. In the spatial TCoF, the mean-removed frames are used as the inputs of the ConvNet; whereas in the temporal TCoF, the differences between two adjacent frames are used as the inputs of the ConvNet We evaluate systematically the proposed spatial TCoF and the temporal TCoF schemes on three benchmark data sets, including DynTex, YUPENN, and Maryland, and demonstrate that the proposed approach yields superior performance., (C) 2015 Elsevier B.V. All rights reserved.
机译:动态纹理和场景分类是理解自然视频内容的两个基本问题。提取强大有效的功能是解决这些问题的关键步骤。然而,现有的方法遭受对变化的照明,或视点改变,甚至照相机运动的敏感性,和/或缺乏空间信息。受到深层结构在图像分类中成功的启发,我们尝试利用深层结构提取动态纹理和场景分类的特征。为了应对训练深度结构的挑战,我们建议将一些先验知识从图像域转移到视频域。更具体地说,我们建议应用训练有素的卷积神经网络(ConvNet)作为特征提取器,从每个帧中提取中层特征,然后通过将一阶和二阶统计量级联来形成视频级表示形式超过中级功能。我们将此两级特征提取方案称为转移ConvNet特征(TCoF)。此外,我们探索了TCoF方案的两种不同实现方式,即空间TCoF和时间TCoF。在空间TCoF中,均值去除的帧用作ConvNet的输入。而在时间TCoF中,将两个相邻帧之间的差异用作ConvNet的输入。我们在三个基准数据集(包括DynTex,YUPENN和马里兰州)上系统地评估了建议的空间TCoF和时间TCoF方案,并证明了(C)2015 Elsevier BV保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号