首页> 外文期刊>Machine Vision and Applications >Multimedia event detection with multimodal feature fusion and temporal concept localization
【24h】

Multimedia event detection with multimodal feature fusion and temporal concept localization

机译:具有多模式特征融合和时间概念定位的多媒体事件检测

获取原文
获取原文并翻译 | 示例
           

摘要

We present a system for multimedia event detection. The developed system characterizes complex multimedia events based on a large array of multimodal features, and classifies unseen videos by effectively fusing diverse responses. We present three major technical innovations. First, we explore novel visual and audio features across multiple semantic granularities, including building, often in an unsupervised manner, mid-level and high-level features upon low-level features to enable semantic understanding. Second, we show a novel Latent SVM model which learns and localizes discriminative high-level concepts in cluttered video sequences. In addition to improving detection accuracy beyond existing approaches, it enables a unique summary for every retrieval by its use of high-level concepts and temporal evidence localization. The resulting summary provides some transparency into why the system classified the video as it did. Finally, we present novel fusion learning algorithms and our methodology to improve fusion learning under limited training data condition. Thorough evaluation on a large TRECVID MED 2011 dataset showcases the benefits of the presented system.
机译:我们提出了一种多媒体事件检测系统。开发的系统基于大量的多模式特征来表征复杂的多媒体事件,并通过有效地融合各种响应来对看不见的视频进行分类。我们提出了三项主要的技术创新。首先,我们探索跨多个语义粒度的新颖视觉和音频功能,包括通常以无监督的方式在低级功能上构建中级和高级功能以实现语义理解。其次,我们展示了一个新颖的潜在SVM模型,该模型可在混乱的视频序列中学习和定位可区分的高级概念。除了通过现有方法提高检测精度外,它还可以通过使用高级概念和时间证据定位来为每次检索提供唯一的摘要。结果摘要为系统为什么对视频进行分类提供了一定的透明度。最后,我们提出了新颖的融合学习算法和我们的方法,以在有限的训练数据条件下改善融合学习。对大型TRECVID MED 2011数据集的全面评估展示了所提出系统的优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号