首页> 外文学位 >Segmentation, structure detection and summarization of multimedia sequences.
【24h】

Segmentation, structure detection and summarization of multimedia sequences.

机译:多媒体序列的分段,结构检测和汇总。

获取原文
获取原文并翻译 | 示例

摘要

This thesis investigates the problem of efficiently summarizing audio-visual sequences. The problem is important since consumers now have access to vast amounts of multimedia content, that can be viewed over a range of devices.; The goal of this thesis is to be able to provide an adaptive framework for automatically generating a short multimedia clip as a summary, when given longer multimedia segment as input to the system. In our framework, the solution to the summarization problem is predicated on the solution to three important sub-problems—segmentation, structure detection and audio-visual condensation of the data.; In the segmentation problem, we focus on the determination of computable scenes. These are segments of audio-visual data that are consistent with respect to certain low-level properties and which preserve the syntax of the original video. This work does not address the problem of semantics of the segments, since this is not a well posed problem. There are three novel ideas in our approach: (a) analysis of the effects of rules of production on the data; (b) a finite, causal memory model for segmenting audio and video and (c) the use of top-down structural grouping rules that enable us to be consistent with human perception. These scenes form the input to our condensation algorithm.; In the problem of detecting structure, we propose a novel framework that analyzes the topology of the sequence. In our work, we will limit our scope to discrete, temporal structures that have a priori known deterministic generative mechanisms. We show two general approaches to solving the problem, and we shall present robust algorithms for detecting two specific visual structures—the dialog and the regular anchor.; We propose a novel entity-utility framework for the problem of condensing audio-visual segments. The idea is that the multimedia sequence can be thought of as comprising entities, a subset of which will satisfy the users information needs. We associate a utility to these entities, and formulate the problem of preserving the entities required by the user as a convex utility maximization problem with constraints. The framework allows for adaptability to changing device and other resource conditions. Other original contributions include—(a) the idea that comprehension of a shot is related to its visual complexity; (b) the idea that the preservation of visual syntax is necessary for the generation of coherent multimedia summaries; (c) auditory analysis that uses discourse structure and (d) novel multimedia synchronization requirements.; We conducted user studies using the multimedia summary clips generated by the system. These user studies indicate that the summaries are perceived as coherent at condensation rates as high as 90%. The study also revealed that the measurable improvements over competing algorithms were statistically significant.
机译:本文研究了有效总结视听序列的问题。这个问题很重要,因为消费者现在可以访问大量的多媒体内容,这些内容可以在各种设备上查看。本文的目的是能够提供一种自适应框架,当给定较长的多媒体片段作为系统输入时,该框架可以自动生成简短的多媒体剪辑作为摘要。在我们的框架中,总结问题的解决方案基于对三个重要子问题的解决方案:数据的细分,结构检测视听压缩。在分割问题中,我们专注于确定可计算场景。这些是视听数据的片段,它们相对于某些低级属性是一致的,并且保留原始视频的语法。这项工作没有解决段的语义问题,因为这不是一个恰当的问题。我们的方法中有三个新颖的想法:(a)分析生产规则对数据的影响; (b)用于分割音频和视频的有限因果记忆模型,以及(c)使用自上而下的结构分组规则,使我们能够与人类感知保持一致。这些场景构成了我们的压缩算法的输入。在检测结构的问题中,我们提出了一种分析序列拓扑的新颖框架。在我们的工作中,我们将范围限制为具有先验的确定性生成机制的离散时间结构。我们展示了解决该问题的两种通用方法,并且将提出用于检测两个特定视觉结构(对话框和常规锚点)的可靠算法。我们为压缩视听片段的问题提出了一种新颖的实体-实用程序框架。想法是,多媒体序列可以被认为是包括实体的实体,实体的子集将满足用户的信息需求。我们将效用与这些实体相关联,并将保留用户所需实体的问题表述为具有约束的凸效用最大化问题。该框架允许适应不断变化的设备和其他资源状况。其他原始贡献包括-(a)镜头的理解与其视觉复杂性有关的想法; (b)认为必须保留视觉句法以产生连贯的多媒体摘要; (c)使用话语结构的听觉分析和(d)新的多媒体同步要求;我们使用系统生成的多媒体摘要剪辑进行了用户研究。这些用户研究表明,在冷凝率高达90%时,这些摘要被认为是连贯的。研究还显示,与竞争算法相比,可衡量的改进具有统计学意义。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号