首页> 外文OA文献 >Semantics of video shots for content-based retrieval
【2h】

Semantics of video shots for content-based retrieval

机译:用于基于内容的检索的视频镜头的语义

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Content-based video retrieval research combines expertise from many different areas, such as signal processing, machine learning, pattern recognition, and computer vision. As video extends into both the spatial and the temporal domain, we require techniques for the temporal decomposition of footage so that specific content can be accessed. This content may then be semantically classified - ideally in an automated process - to enable filtering, browsing, and searching. An important aspect that must be considered is that pictorial representation of information may be interpreted differently by individual users because it is less specific than its textual representation. In this thesis, we address several fundamental issues of content-based video retrieval for effective handling of digital footage. Temporal segmentation, the common first step in handling digital video, is the decomposition of video streams into smaller, semantically coherent entities. This is usually performed by detecting the transitions that separate single camera takes. While abrupt transitions - cuts - can be detected relatively well with existing techniques, effective detection of gradual transitions remains difficult. We present our approach to temporal video segmentation, proposing a novel algorithm that evaluates sets of frames using a relatively simple histogram feature. Our technique has been shown to range among the best existing shot segmentation algorithms in large-scale evaluations. The next step is semantic classification of each video segment to generate an index for content-based retrieval in video databases. Machine learning techniques can be applied effectively to classify video content. However, these techniques require manually classified examples for training before automatic classification of unseen content can be carried out. Manually classifying training examples is not trivial because of the implied ambiguity of visual content. We propose an unsupervised learning approach based on latent class modelling in which we obtain multiple judgements per video shot and model the users' response behaviour over a large collection of shots. This technique yields a more generic classification of the visual content. Moreover, it enables the quality assessment of the classification, and maximises the number of training examples by resolving disagreement. We apply this approach to data from a large-scale, collaborative annotation effort and present ways to improve the effectiveness for manual annotation of visual content by better design and specification of the process. Automatic speech recognition techniques along with semantic classification of video content can be used to implement video search using textual queries. This requires the application of text search techniques to video and the combination of different information sources. We explore several text-based query expansion techniques for speech-based video retrieval, and propose a fusion method to improve overall effectiveness. To combine both text and visual search approaches, we explore a fusion technique that combines spoken information and visual information using semantic keywords automatically assigned to the footage based on the visual content. The techniques that we propose help to facilitate effective content-based video retrieval and highlight the importance of considering different user interpretations of visual content. This allows better understanding of video content and a more holistic approach to multimedia retrieval in the future.
机译:基于内容的视频检索研究结合了来自许多不同领域的专业知识,例如信号处理,机器学习,模式识别和计算机视觉。随着视频扩展到空间和时间领域,我们需要用于素材的时间分解的技术,以便可以访问特定内容。然后可以对该内容进行语义分类(理想情况下是在自动化过程中进行分类),以实现过滤,浏览和搜索。必须考虑的一个重要方面是,信息的图形表示可能由各个用户以不同的方式解释,因为它的含义不如文本表示具体。在本文中,我们解决了基于内容的视频检索以有效处理数字素材的几个基本问​​题。时间分段是处理数字视频的常见第一步,它是将视频流分解为较小的,语义上一致的实体。这通常是通过检测单独的单个摄像机执行的转换来执行的。尽管可以使用现有技术相对较好地检测到突然的过渡(割伤),但仍然很难有效地检测出逐渐过渡。我们提出了用于时间视频分割的方法,提出了一种新颖的算法,该算法使用相对简单的直方图特征来评估帧集。我们的技术已被证明在大规模评估中属于现有最好的镜头分割算法。下一步是对每个视频片段进行语义分类,以生成索引,以便在视频数据库中进行基于内容的检索。机器学习技术可以有效地应用于对视频内容进行分类。但是,这些技术需要手动分类的示例进行训练,然后才能对未看到的内容进行自动分类。由于视觉内容的隐含歧义,对培训示例进行手动分类并非易事。我们提出了一种基于潜在类建模的无监督学习方法,其中,我们对每个视频镜头都获得了多个判断,并在大量镜头中对用户的响应行为进行了建模。此技术可对视觉内容进行更通用的分类。此外,它可以对分类进行质量评估,并通过解决分歧来最大程度地提高训练示例的数量。我们将这种方法应用于大规模协作注释工作中的数据,并提出了通过更好地设计和规范流程来提高视觉内容手动注释效果的方法。自动语音识别技术以及视频内容的语义分类可用于实现使用文本查询的视频搜索。这要求将文本搜索技术应用于视频以及不同信息源的组合。我们探索了几种基于文本的查询扩展技术,用于基于语音的视频检索,并提出了一种融合方法来提高整体效果。为了将文本和视觉搜索方法结合起来,我们探索了一种融合技术,该技术使用基于视觉内容自动分配给素材的语义关键字将口头信息和视觉信息相结合。我们提出的技术有助于促进基于内容的有效视频检索,并强调考虑不同用户对视觉内容的解释的重要性。这样可以更好地理解视频内容,并在将来提供更全面的多媒体检索方法。

著录项

  • 作者

    Volkmer T;

  • 作者单位
  • 年度 2007
  • 总页数
  • 原文格式 PDF
  • 正文语种
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号