首页> 外文期刊>Multimedia, IEEE Transactions on >Hierarchical Concept Score Postprocessing and Concept-Wise Normalization in CNN-Based Video Event Recognition
【24h】

Hierarchical Concept Score Postprocessing and Concept-Wise Normalization in CNN-Based Video Event Recognition

机译:基于CNN的视频事件识别中的分层概念分数后处理和明智的规范化

获取原文
获取原文并翻译 | 示例

摘要

This paper is focused on video event recognition based on frame level convolutional neural network (CNN) descriptors. Using transfer learning, the image trained descriptors are applied to the video domain to make event recognition feasible in scenarios with limited computational resources. After fine-tuning of the existing CNN concept score extractors, pretrained on ImageNet, the output descriptors of the different fully connected layers are employed as frame descriptors. The resulting descriptors are hierarchically postprocessed and combined with novel and efficient pooling and normalization methods. As major contributions of this paper to the video event recognition, we present a postprocessing scheme in which the hierarchy and the relative shortest distance of concepts in WordNet concept tree is taken into account to alleviate uncertainty of the resulting concept scores at the output of the CNN. Besides, we propose a concept-wise power law normalization method that outperforms the widely used power law normalization. The integration of these approaches results in a high performance average (max) pooling-based video event recognition. Compared to the average (max) pooling combined with the state-of-the-art normalization methods and fine-tuned support vector machine classification, the proposed processing scheme improves the event recognition accuracy in terms of mean average precision over the Columbia consumer video and unstructured social activity attribute datasets, where achieves a pretty comparable result on UCF101 and ActivityNet datasets.
机译:本文重点研究基于帧级卷积神经网络(CNN)描述符的视频事件识别。使用转移学习,将经过图像训练的描述符应用于视频域,以便在计算资源有限的情况下实现事件识别。在对现有的CNN概念分数提取器进行微调之后,在ImageNet上对其进行了预训练,然后将不同完全连接层的输出描述符用作帧描述符。生成的描述符经过分层后处理,并与新颖有效的合并和规范化方法结合在一起。作为本文对视频事件识别的主要贡献,我们提出了一种后处理方案,其中考虑了WordNet概念树中概念的层次结构和相对最短距离,以减轻CNN输出结果分数的不确定性。此外,我们提出了一种概念性的幂律归一化方法,该方法优于广泛使用的幂律归一化。这些方法的集成导致了基于高性能平均(max)池化的视频事件识别。与平均(最大)合并结合最新的归一化方法和微调的支持向量机分类相比,所提出的处理方案从哥伦比亚消费者视频和非结构化社交活动属性数据集,在UCF101和ActivityNet数据集上取得了相当可比的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号