首页> 外文会议>International Conference on Digital Image Computing: Techniques and Applications >Violent Scene Detection using a Super Descriptor Tensor Decomposition
【24h】

Violent Scene Detection using a Super Descriptor Tensor Decomposition

机译:使用超级描述符张量分解的暴力场景检测

获取原文

摘要

This article presents a new method for violent scene detection using super descriptor tensor decomposition. Multi-modal local features comprising auditory and visual features are extracted from Mel-frequency cepstral coefficients (including first and second order derivatives) and refined dense trajectories. There is usually a large number of dense trajectories extracted from a video sequence; some of these trajectories are unnecessary and can affect the accuracy. We propose to refine the dense trajectories by selecting only discriminative trajectories in the region of interest. Visual descriptors consisting of oriented gradient and motion boundary histograms are computed along the refined dense trajectories. In traditional bag-of-visual-words techniques, the feature descriptors are concatenated to form a single large feature vector for classification. This destroys the spatio-temporal interactions among features extracted from multi-modal data. To address this problem, a super descriptor tensor decomposition is proposed. The extracted feature descriptors are first encoded using super descriptor vector method. Then the encoded features are arranged as tensors so as to retain the spatio-temporal structure of the features. To obtain a compact set of features for classification, the TUCKER-3 decomposition is applied to the super descriptor tensors, followed by feature selection using Fisher feature ranking. The obtained features are fed to a support vector machine classifier. Experimental evaluation is performed on violence detection benchmark dataset, MediaEval VSD2014. The proposed method outperforms most of the state-of-the-art methods, achieving MAP2014 scores of 60.2% and 67.8% on two subsets of the dataset.
机译:本文介绍了使用超级描述符张量分解的暴力场景检测方法。包括听觉和视觉特征的多模态局部特征是从熔融频率谱系数(包括第一和二阶衍生物)和精制致密轨迹中提取的。通常从视频序列中提取了大量的致密轨迹;其中一些轨迹是不必要的,可以影响准确性。我们建议通过在感兴趣区域中选择鉴别的轨迹来优化致密的轨迹。由面向梯度和运动边界直方图组成的视觉描述符沿着精细的密集轨迹计算。在传统的视觉单词技术中,特征描述符连接以形成单个大特征向量以进行分类。这会破坏从多模态数据中提取的特征之间的时空交互。为了解决这个问题,提出了一种超级描述符的张量分解。首先使用超描述符矢量方法编码提取的特征描述符。然后,编码特征被布置为张量,以便保留特征的时空结构。为了获得用于分类的紧凑功能,将Tucker-3分解应用于超描述符张量,然后使用Fisher特征排名进行特征选择。所获得的特征被馈送到支持向量机分类器。对暴力检测基准数据集进行实验评估,Mediaeval VSD2014。所提出的方法优于大多数最先进的方法,在数据集的两个子集上实现MAP2014分数为60.2%和67.8%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号