首页> 外文学位 >Learning, detection, representation, indexing and retrieval of multi-agent events in videos.
【24h】

Learning, detection, representation, indexing and retrieval of multi-agent events in videos.

机译:视频中多主体事件的学习,检测,表示,索引和检索。

获取原文
获取原文并翻译 | 示例

摘要

The world that we live in is a complex network of agents and their interactions which are termed as events. An instance of an event is composed of directly measurable low-level actions (which I term sub-events) having a temporal order. Also, the agents can act independently (e.g. voting) as well as collectively (e.g. scoring a touch-down in a football game) to perform an event. With the dawn of the new millennium, the low-level vision tasks such as segmentation, object classification, and tracking have become fairly robust. But a representational gap still exists between low-level measurements and high-level understanding of video sequences. This dissertation is an effort to bridge that gap where I propose novel learning, detection, representation, indexing and retrieval approaches for multi-agent events in videos.; In order to achieve the goal of high-level understanding of videos, firstly, I apply statistical learning techniques to model the multiple agent events. For that purpose, I use the training videos to model the events by estimating the conditional dependencies between sub-events. Thus, given a video sequence, I track the people (heads and hand regions) and objects using a Meanshift tracker. An underlying rule-based system detects the sub-events using the tracked trajectories of the people and objects, based on their relative motion. Next, an event model is constructed by estimating the sub-event dependencies, that is, how frequently sub-event B occurs given that sub-event A has occurred. The advantages of such an event model are two-fold. First, I do not require prior knowledge of the number of agents involved in an event. Second, no assumptions are made about the length of an event.; Secondly, after learning the event models, I detect events in a novel video by using graph clustering techniques. To that end, I construct a graph of temporally ordered sub-events occurring in the novel video. Next, using the learnt event model, I estimate a weight matrix of conditional dependencies between sub-events in the novel video. Further application of Normalized Cut (graph clustering technique) on the estimated weight matrix facilitate in detecting events in the novel video. The principal assumption made in this work is that the events are composed of highly correlated chains of sub-events that have high conditional dependency (association) within the cluster and relatively low conditional dependency (disassociation) between clusters.; Thirdly, in order to represent the detected events, I propose an extension of CASE representation of natural languages. I extend CASE to allow the representation of temporal structure between sub-events. Also, in order to capture both multi-agent and multi-threaded events, I introduce a hierarchical CASE representation of events in terms of sub-events and case-lists. The essence of the proposition is that, based on the temporal relationships of the agent motions and a description of its state, it is possible to build a formal description of an event. Furthermore, I recognize the importance of representing the variations in the temporal order of sub-events, that may occur in an event, and encode the temporal probabilities directly into my event representation. The proposed extended representation with probabilistic temporal encoding is termed P-CASE that allows a plausible means of interface between users and the computer. Using the P-CASE representation I automatically encode the event ontology from training videos. This offers a significant advantage, since the domain experts do not have to go through the tedious task of determining the structure of events by browsing all the videos.; Finally, I utilize the event representation for indexing and retrieval of events. Given the different instances of a particular event, I index the events using the P-CASE representation. Next, given a query in the P-CASE representation, event retrieval is performed using a two-level search. At the first level, a m
机译:我们生活的世界是一个复杂的代理商网络及其互动,被称为事件。事件的实例由具有时间顺序的直接可测量的低级动作(我称其为子事件)组成。而且,代理可以独立地行动(例如,投票)并且可以集体地行动(例如,在足球比赛中得分)来执行事件。随着新千年的到来,诸如分割,对象分类和跟踪之类的低级视觉任务变得相当强大。但是,在低级测量和高级理解视频序列之间仍然存在代表性的差距。本文旨在弥合这一差距,在此我提出了针对视频中多主体事件的新颖学习,检测,表示,索引和检索方法。为了实现对视频的高层次理解的目标,首先,我应用统计学习技术对多个代理事件进行建模。为此,我使用训练视频通过估计子事件之间的条件依赖性来对事件进行建模。因此,给定视频序列,我使用Meanshift跟踪器跟踪人(头部和手部区域)和对象。一个基于规则的基础系统使用子对象和对象的相对运动,通过跟踪它们的轨迹来检测子事件。接下来,通过估计子事件依赖性来构造事件模型,即,假设子事件A已经发生,子事件B发生的频率。这种事件模型的优点是双重的。首先,我不需要事件中涉及的代理人数的先验知识。其次,不对事件的持续时间做出任何假设。其次,在学习了事件模型之后,我使用图聚类技术检测了新颖视频中的事件。为此,我构造了一个在新颖视频中出现的时间顺序子事件的图形。接下来,使用学习的事件模型,我估计了新颖视频中子事件之间的条件依存关系的权重矩阵。在估计的权矩阵上进一步应用归一化剪切(图聚类技术)有助于检测新颖视频中的事件。这项工作的主要假设是事件由高度关联的子事件链组成,这些子事件在集群中具有较高的条件依赖性(关联),在集群之间具有较低的条件依赖性(解除关联)。第三,为了表示检测到的事件,我建议对自然语言的CASE表示进行扩展。我将CASE扩展为允许子事件之间的时间结构表示。另外,为了同时捕获多主体事件和多线程事件,我从子事件和案例列表的角度介绍了事件的分层CASE表示形式。命题的本质是,基于主体动作的时间关系及其状态描述,可以构建事件的形式描述。此外,我认识到了表示事件中可能发生的子事件的时间顺序变化的重要性,并将时间概率直接编码为我的事件表示形式。所提出的具有概率时间编码的扩展表示形式被称为P-CASE,它允许在用户和计算机之间建立合理的接口方式。使用P-CASE表示,我可以自动从训练视频中编码事件本体。这提供了显着的优势,因为领域专家不必通过浏览所有视频来完成确定事件结构的繁琐任务。最后,我利用事件表示法对事件进行索引和检索。给定特定事件的不同实例,我使用P-CASE表示法为事件建立索引。接下来,给定P-CASE表示形式的查询,使用两级搜索执行事件检索。在第一层

著录项

  • 作者

    Hakeem, Asaad.;

  • 作者单位

    University of Central Florida.;

  • 授予单位 University of Central Florida.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2007
  • 页码 158 p.
  • 总页数 158
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号