首页> 外文OA文献 >Audio-coupled video content understanding of unconstrained video sequences
【2h】

Audio-coupled video content understanding of unconstrained video sequences

机译:音频耦合视频内容理解无约束视频序列

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Unconstrained video understanding is a difficult task. The main aim of this thesis is to recognise the nature of objects, activities and environment in a given video clip using both audio and video information. Traditionally, audio and video information has not been applied together for solving such complex task, and for the first time we propose, develop, implement and test a new framework of multi-modal (audio and video) data analysis for context understanding and labelling of unconstrained videos. The framework relies on feature selection techniques and introduces a novel algorithm (PCFS) that is faster than the well-established SFFS algorithm. We use the framework for studying the benefits of combining audio and video information in a number of different problems. We begin by developing two independent content recognition modules. The first one is based on image sequence analysis alone, and uses a range of colour, shape, texture and statistical features from image regions with a trained classifier to recognise the identity of objects, activities and environment present. The second module uses audio information only, and recognises activities and environment. Both of these approaches are preceded by detailed pre-processing to ensure that correct video segments containing both audio and video content are present, and that the developed system can be made robust to changes in camera movement, illumination, random object behaviour etc. For both audio and video analysis, we use a hierarchical approach of multi-stage classification such that difficult classification tasks can be decomposed into simpler and smaller tasks. When combining both modalities, we compare fusion techniques at different levels of integration and propose a novel algorithm that combines advantages of both feature and decision-level fusion. The analysis is evaluated on a large amount of test data comprising unconstrained videos collected for this work. We finally, propose a decision correction algorithm which shows that further steps towards combining multi-modal classification information effectively with semantic knowledge generates the best possible results.
机译:不受限制的视频理解是一项艰巨的任务。本文的主要目的是利用音频和视频信息识别给定视频片段中对象,活动和环境的性质。传统上,音频和视频信息尚未一起用于解决此类复杂任务,并且我们首次提出,开发,实施和测试一种新的多模式(音频和视频)数据分析框架,用于上下文理解和标记。不受限制的视频。该框架依赖于特征选择技术,并引入了一种新颖的算法(PCFS),该算法比公认的SFFS算法要快。我们使用该框架来研究在许多不同问题中组合音频和视频信息的好处。我们首先开发两个独立的内容识别模块。第一个仅基于图像序列分析,并使用经过训练的分类器使用图像区域中的一系列颜色,形状,纹理和统计特征,以识别物体,活动和环境的身份。第二个模块仅使用音频信息,并识别活动和环境。在这两种方法之前都进行了详细的预处理,以确保同时存在包含音频和视频内容的正确视频片段,并且可以使开发的系统对摄像机移动,照明,随机对象行为等的更改具有鲁棒性。音频和视频分析,我们使用多阶段分类的分层方法,以便将困难的分类任务分解为更简单和更小的任务。当结合这两种方式时,我们比较了不同融合级别的融合技术,并提出了一种融合特征和决策级融合优势的新颖算法。根据大量测试数据对分析进行评估,这些测试数据包括为此工作收集的不受约束的视频。最后,我们提出一种决策校正算法,该算法表明将多模式分类信息与语义知识有效结合的进一步步骤可以产生最佳结果。

著录项

  • 作者

    Lopes Jose E F C;

  • 作者单位
  • 年度 2011
  • 总页数
  • 原文格式 PDF
  • 正文语种 English
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号