首页> 外文期刊>Image Processing, IEEE Transactions on >Deep Fusion of Multiple Semantic Cues for Complex Event Recognition
【24h】

Deep Fusion of Multiple Semantic Cues for Complex Event Recognition

机译:多种语义线索的深度融合,用于复杂事件识别

获取原文
获取原文并翻译 | 示例
       

摘要

We present a deep learning strategy to fuse multiple semantic cues for complex event recognition. In particular, we tackle the recognition task by answering how to jointly analyze human actions (who is doing what), objects (what), and scenes (where). First, each type of semantic features (e.g., human action trajectories) is fed into a corresponding multi-layer feature abstraction pathway, followed by a fusion layer connecting all the different pathways. Second, the correlations of how the semantic cues interacting with each other are learned in an unsupervised cross-modality autoencoder fashion. Finally, by fine-tuning a large-margin objective deployed on this deep architecture, we are able to answer the question on how the semantic cues of who, what, and where compose a complex event. As compared with the traditional feature fusion methods (e.g., various early or late strategies), our method jointly learns the essential higher level features that are most effective for fusion and recognition. We perform extensive experiments on two real-world complex event video benchmarks, MED’11 and CCV, and demonstrate that our method outperforms the best published results by 21% and 11%, respectively, on an event recognition task.
机译:我们提出了一种深度学习策略,可以融合多个语义线索以进行复杂的事件识别。尤其是,我们通过回答如何共同分析人类行为(谁在做什么),物体(什么)和场景(哪里)来解决识别任务。首先,将每种类型的语义特征(例如,人类动作轨迹)馈入相应的多层特征抽象路径,然后是将所有不同路径连接在一起的融合层。其次,以无监督的跨模态自动编码器方式学习语义线索如何交互的相关性。最后,通过微调部署在此深度架构上的大型目标,我们能够回答有关谁,什么以及在何处构成复杂事件的语义提示的问题。与传统的特征融合方法(例如,各种早期或晚期策略)相比,我们的方法共同学习了对于融合和识别最有效的基本高级特征。我们在两个现实世界中的复杂事件视频基准MED’11和CCV上进行了广泛的实验,证明了在事件识别任务上,我们的方法分别比最佳发布结果高21%和11%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号