首页> 外文会议>ACM international conference on Multimedia >Detecting video events based on action recognition in complex scenes using spatio-temporal descriptor
【24h】

Detecting video events based on action recognition in complex scenes using spatio-temporal descriptor

机译:使用时空描述符在复杂场景中的动作识别检测视频事件

获取原文

摘要

Event detection plays an essential role in video content analysis and remains a challenging open problem. In particular, the study on detecting human-related video events in complex scenes with both a crowd of people and dynamic motion is still limited. In this paper, we investigate detecting video events that involve elementary human actions, e.g. making cellphone call, putting an object down, and pointing to something, in complex scenes using a novel spatio-temporal descriptor based approach. A new spatio-temporal descriptor, which temporally integrates the statistics of a set of response maps of low-level features, e.g. image gradients and optical flows, in a space-time cube, is proposed to capture the characteristics of actions in terms of their appearance and motion patterns. Based on this kind of descriptors, the bag-of-words method is utilized to describe a human figure as a concise feature vector. Then, these features are employed to train SVM classifiers at multiple spatial pyramid levels to distinguish different actions. Finally, a Gaussian kernel based temporal filtering is conducted to segment the sequences of events from a video stream taking account of the temporal consistency of actions. The proposed approach is capable of tolerating spatial layout variations and local deformations of human actions due to diverse view angles and rough human figure alignment in complex scenes. Extensive experiments on the 50-hour video dataset of TRECVid 2008 event detection task demonstrate that our approach outperforms the well-known SIFT descriptor based methods and effectively detects video events in challenging real-world conditions.
机译:事件检测在视频内容分析中起着重要作用,仍然是一个具有挑战性的公开问题。特别是,在与人群和动态运动中检测复杂场景中的人类相关视频事件的研究仍然有限。在本文中,我们调查了涉及基本人类行为的检测视频事件,例如,涉及基本的人类行动。通过基于新的时空描述符的方法,在复杂的场景中将对象放下并指向某物,并指向某事物。新的时空描述符,其在时间上集成了低级功能集的一组响应图的统计数据。在时空立方体中,提出了在时空立方体中的图像梯度和光学流动,以捕获其外观和运动模式的动作特征。基于这种描述符,使用袋式方法来描述作为简洁特征向量的人类图。然后,这些特征用于在多个空间金字塔水平下训练SVM分类器以区分不同的动作。最后,进行了基于高斯内核的时间滤波,以从考虑行动的时间一致性,将事件的序列分段。所提出的方法能够承受由于在复杂场景中的不同视角和粗糙的人体图形对准而导致人类动作的空间布局变化和局部变形。在Trecvid 2008事件检测任务的50小时视频数据集上进行了广泛的实验,表明我们的方法优于基于众所周知的SIFT描述符的方法,并有效地检测在具有挑战性的真实条件下的视频事件。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号