Complex event detection via attention-based video representation and classification

Zhao Zhicheng; Xiang Rui; Su Fei

首页> 外文期刊>Multimedia Tools and Applications >Complex event detection via attention-based video representation and classification

【24h】

Complex event detection via attention-based video representation and classification

机译：通过基于注意力的视频表示和分类进行复杂事件检测

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

As an important task in managing unconstrained web videos, multimedia event detection (MED) has attracted wide attention recently. However, due to the complexities such as high abstraction of the events, various scenes and frequent interactions of individuals etc., MED is quite challenging. In this paper, we propose a novel MED algorithm via attention-based video representation and classification. Firstly, inspired by human's selective attention mechanism, an attention-based saliency localization network (ASLN) is constructed to quickly predict the semantic saliency objects of video frames. Afterwards, in order to complementarily represent salient objects and the surroundings, two Convolutional Neural Networks (CNNs) features, i.e., local saliency feature and global feature are respectively extracted from the salient objects and the whole feature map. Thirdly, after binding two features together, Vector of Locally Aggregated Descriptors (VLAD) is applied to encode them into the video representation. Finally, the linear Support Vector Machine (SVM) classifiers are trained to classify. We extensively evaluate the performance on TRECVID MED14_10Ex, MED14_100Ex and Columbia Consume Video (CCV) datasets. Experimental results show that the proposed single model outperforms state-of-the-art approaches on all three real-world video datasets, and demonstrate the effectiveness.

机译：作为管理不受约束的网络视频的重要任务，多媒体事件检测（MED）最近引起了广泛的关注。但是，由于事件的高度抽象，各种场景以及个人的频繁交互等复杂性，MED颇具挑战性。在本文中，我们通过基于注意力的视频表示和分类提出了一种新颖的MED算法。首先，在人类选择性注意机制的启发下，构建了基于注意的显着性定位网络（ASLN），以快速预测视频帧的语义显着性对象。然后，为了互补地表示显着物体和周围环境，分别从显着物体和整个特征图中提取出两个卷积神经网络（CNN）特征，即局部显着特征和全局特征。第三，在将两个特征绑定在一起之后，应用局部聚合描述符向量（VLAD）将其编码为视频表示形式。最后，训练线性支持向量机（SVM）分类器进行分类。我们广泛评估了TRECVID MED14_10Ex，MED14_100Ex和Columbia Consume Video（CCV）数据集的性能。实验结果表明，所提出的单一模型在所有三个现实世界的视频数据集上均优于最新方法，并证明了其有效性。

著录项

来源
《Multimedia Tools and Applications》 |2018年第3期|3209-3227|共19页
作者
Zhao Zhicheng; Xiang Rui; Su Fei;
展开▼
作者单位

Beijing Univ Posts & Telecommun, Sch Informat & Commun Engn, Beijing 100876, Peoples R China;

Beijing Univ Posts & Telecommun, Sch Informat & Commun Engn, Beijing 100876, Peoples R China;

Beijing Univ Posts & Telecommun, Sch Informat & Commun Engn, Beijing 100876, Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Multimedia event detection; Visual attention; Salient object; Vlad;

机译：多媒体事件检测;视觉注意力;显着物体;弗拉德;

相似文献

外文文献
中文文献
专利

1. Learning Normal Patterns via Adversarial Attention-Based Autoencoder for Abnormal Event Detection in Videos [J] . Song Hao, Sun Che, Wu Xinxiao, IEEE transactions on multimedia . 2020,第8期

机译：通过对抗性关注的AutoEncoder学习正常模式，用于视频中的异常事件检测
2. Learning, detection and representation of multi-agent events in videos [J] . Asaad Hakeem, Mubarak Shah Artificial intelligence . 2007,第8a9期

机译：视频中多主体事件的学习，检测和表示
3. Weakly supervised detection with decoupled attention-based deep representation [J] . Jiang Wenhui, Zhao Zhicheng, Su Fei Multimedia Tools and Applications . 2018,第3期

机译：弱监督检测与基于关注点的深度表示分离
4. Online Aggregated-Event Representation for Multiple Event Detection in Videos [C] . Molefe Vicky Mleya, Weiqi Li, Jiayu Liang, International Conference on Advanced Data Mining and Applications . 2019

机译：视频中多事件检测的在线聚集事件表示
5. Learning, detection, representation, indexing and retrieval of multi-agent events in videos. [D] . Hakeem, Asaad. 2007

机译：视频中多主体事件的学习，检测，表示，索引和检索。
6. Classification Videos Reveal the Visual Information Driving Complex Real-World Speeded Decisions [O] . Sepehr Jalali, Sian E. Martin, Colm P. Murphy, -1

机译：分类视频揭示了视觉信息驱动复杂的现实世界中的快速决策
7. Learning, detection and representation of multi-agent events in videos [O] . Hakeem Asaad, Shah Mubarak 2007

机译：视频中多主体事件的学习，检测和表示
8. Video Detection and Classification of Pedestrian Events at Roundabouts and Crosswalks. Final Report. [R] . Morris, T., Li, X., Morellas, V., 2013

机译：环形交叉路口和人行横道上的行人事件视频检测与分类。总结报告。

Complex event detection via attention-based video representation and classification

摘要

著录项

相似文献

相关主题

期刊订阅