...
首页> 外文期刊>Multimedia Tools and Applications >Complex event detection via attention-based video representation and classification
【24h】

Complex event detection via attention-based video representation and classification

机译:通过基于注意力的视频表示和分类进行复杂事件检测

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

As an important task in managing unconstrained web videos, multimedia event detection (MED) has attracted wide attention recently. However, due to the complexities such as high abstraction of the events, various scenes and frequent interactions of individuals etc., MED is quite challenging. In this paper, we propose a novel MED algorithm via attention-based video representation and classification. Firstly, inspired by human's selective attention mechanism, an attention-based saliency localization network (ASLN) is constructed to quickly predict the semantic saliency objects of video frames. Afterwards, in order to complementarily represent salient objects and the surroundings, two Convolutional Neural Networks (CNNs) features, i.e., local saliency feature and global feature are respectively extracted from the salient objects and the whole feature map. Thirdly, after binding two features together, Vector of Locally Aggregated Descriptors (VLAD) is applied to encode them into the video representation. Finally, the linear Support Vector Machine (SVM) classifiers are trained to classify. We extensively evaluate the performance on TRECVID MED14_10Ex, MED14_100Ex and Columbia Consume Video (CCV) datasets. Experimental results show that the proposed single model outperforms state-of-the-art approaches on all three real-world video datasets, and demonstrate the effectiveness.
机译:作为管理不受约束的网络视频的重要任务,多媒体事件检测(MED)最近引起了广泛的关注。但是,由于事件的高度抽象,各种场景以及个人的频繁交互等复杂性,MED颇具挑战性。在本文中,我们通过基于注意力的视频表示和分类提出了一种新颖的MED算法。首先,在人类选择性注意机制的启发下,构建了基于注意的显着性定位网络(ASLN),以快速预测视频帧的语义显着性对象。然后,为了互补地表示显着物体和周围环境,分别从显着物体和整个特征图中提取出两个卷积神经网络(CNN)特征,即局部显着特征和全局特征。第三,在将两个特征绑定在一起之后,应用局部聚合描述符向量(VLAD)将其编码为视频表示形式。最后,训练线性支持向量机(SVM)分类器进行分类。我们广泛评估了TRECVID MED14_10Ex,MED14_100Ex和Columbia Consume Video(CCV)数据集的性能。实验结果表明,所提出的单一模型在所有三个现实世界的视频数据集上均优于最新方法,并证明了其有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号