Deep Fusion of Multiple Semantic Cues for Complex Event Recognition

Zhang Xishan; Zhang Hanwang; Zhang Yongdong; Yang Yang; Wang Meng; Luan Huanbo; Li Jintao; Chua Tat-Seng

首页> 外文期刊>Image Processing, IEEE Transactions on >Deep Fusion of Multiple Semantic Cues for Complex Event Recognition

【24h】

Deep Fusion of Multiple Semantic Cues for Complex Event Recognition

机译：多种语义线索的深度融合，用于复杂事件识别

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We present a deep learning strategy to fuse multiple semantic cues for complex event recognition. In particular, we tackle the recognition task by answering how to jointly analyze human actions (who is doing what), objects (what), and scenes (where). First, each type of semantic features (e.g., human action trajectories) is fed into a corresponding multi-layer feature abstraction pathway, followed by a fusion layer connecting all the different pathways. Second, the correlations of how the semantic cues interacting with each other are learned in an unsupervised cross-modality autoencoder fashion. Finally, by fine-tuning a large-margin objective deployed on this deep architecture, we are able to answer the question on how the semantic cues of who, what, and where compose a complex event. As compared with the traditional feature fusion methods (e.g., various early or late strategies), our method jointly learns the essential higher level features that are most effective for fusion and recognition. We perform extensive experiments on two real-world complex event video benchmarks, MED’11 and CCV, and demonstrate that our method outperforms the best published results by 21% and 11%, respectively, on an event recognition task.

机译：我们提出了一种深度学习策略，可以融合多个语义线索以进行复杂的事件识别。尤其是，我们通过回答如何共同分析人类行为（谁在做什么），物体（什么）和场景（哪里）来解决识别任务。首先，将每种类型的语义特征（例如，人类动作轨迹）馈入相应的多层特征抽象路径，然后是将所有不同路径连接在一起的融合层。其次，以无监督的跨模态自动编码器方式学习语义线索如何交互的相关性。最后，通过微调部署在此深度架构上的大型目标，我们能够回答有关谁，什么以及在何处构成复杂事件的语义提示的问题。与传统的特征融合方法（例如，各种早期或晚期策略）相比，我们的方法共同学习了对于融合和识别最有效的基本高级特征。我们在两个现实世界中的复杂事件视频基准MED’11和CCV上进行了广泛的实验，证明了在事件识别任务上，我们的方法分别比最佳发布结果高21％和11％。

著录项

来源
《Image Processing, IEEE Transactions on》 |2016年第3期|1033-1046|共14页
作者
Zhang Xishan; Zhang Hanwang; Zhang Yongdong; Yang Yang; Wang Meng; Luan Huanbo; Li Jintao; Chua Tat-Seng;
展开▼
作者单位

Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Multimedia event recognition; deep learning; fusion; multimedia event recognition;

机译：多媒体事件识别;深度学习;融合;多媒体事件识别;
入库时间 2022-08-17 13:10:01

相似文献

外文文献
中文文献
专利

1. Combining multiple deep cues for action recognition [J] . Wang Ruiqi, Wu Xinxiao Multimedia Tools and Applications . 2019,第8期

机译：结合多种深度线索进行动作识别
2. Combining multiple deep cues for action recognition [J] . Wang Ruiqi, Wu Xinxiao Multimedia Tools and Applications . 2019,第8期

机译：组合多个深刻的线索进行行动识别
3. Semantic event fusion of computer vision and ambient sensor data for activity recognition to support dementia care [J] . Stavropoulos Thanos G., Meditskos Georgios, Andreadis Stelios, Journal of ambient intelligence and humanized computing . 2020,第8期

机译：电脑视觉和环境传感器数据的语义事件融合，以支持痴呆护理
4. Graph Neural Networks for Image Understanding Based on Multiple Cues: Group Emotion Recognition and Event Recognition as Use Cases [C] . Xin Guo, Luisa F. Polanía, Bin Zhu, IEEE Winter Conference on Applications of Computer Vision . 2020

机译：基于多个线索的图神经网络图像理解：用例的群体情感识别和事件识别
5. Deep Multimodal Fusion Networks for Semantic Segmentation [D] . Tetreault, Jesse 2017

机译：用于语义分割的深层多模融合网络
6. SemImput: Bridging Semantic Imputation with Deep Learning for Complex Human Activity Recognition [O] . Muhammad Asif Razzaq, Ian Cleland, Chris Nugent, 2020

机译：Semimput：对复杂的人类活动识别的深度学习桥接语义估算
7. Fusion of Multiple Visual Cues for Object Recognition in Videos [O] . González-Díaz Iván, Benois-Pineau Jenny, Buso Vincent, 2014

机译：多种视觉提示融合，实现视频中的目标识别

Deep Fusion of Multiple Semantic Cues for Complex Event Recognition

摘要

著录项

相似文献

相关主题

期刊订阅