首页> 外文会议>ACM international conference on Multimedia >Detecting video events based on action recognition in complex scenes using spatio-temporal descriptor
【24h】

Detecting video events based on action recognition in complex scenes using spatio-temporal descriptor

机译:使用时空描述符在复杂场景中基于动作识别检测视频事件

获取原文

摘要

Event detection plays an essential role in video content analysis and remains a challenging open problem. In particular, the study on detecting human-related video events in complex scenes with both a crowd of people and dynamic motion is still limited. In this paper, we investigate detecting video events that involve elementary human actions, e.g. making cellphone call, putting an object down, and pointing to something, in complex scenes using a novel spatio-temporal descriptor based approach. A new spatio-temporal descriptor, which temporally integrates the statistics of a set of response maps of low-level features, e.g. image gradients and optical flows, in a space-time cube, is proposed to capture the characteristics of actions in terms of their appearance and motion patterns. Based on this kind of descriptors, the bag-of-words method is utilized to describe a human figure as a concise feature vector. Then, these features are employed to train SVM classifiers at multiple spatial pyramid levels to distinguish different actions. Finally, a Gaussian kernel based temporal filtering is conducted to segment the sequences of events from a video stream taking account of the temporal consistency of actions. The proposed approach is capable of tolerating spatial layout variations and local deformations of human actions due to diverse view angles and rough human figure alignment in complex scenes. Extensive experiments on the 50-hour video dataset of TRECVid 2008 event detection task demonstrate that our approach outperforms the well-known SIFT descriptor based methods and effectively detects video events in challenging real-world conditions.
机译:事件检测在视频内容分析中起着至关重要的作用,并且仍然是一个具有挑战性的开放性问题。尤其是,在人群和动态运动双方都很复杂的场景中检测与人类有关的视频事件的研究仍然很有限。在本文中,我们调查了检测涉及基本人类动作的视频事件,例如使用新颖的基于时空描述符的方法在复杂的场景中拨打电话,放下物体并指向某物。一个新的时空描述符,它在时间上整合了一组低级特征(例如,特征)的响应图的统计信息。提出了在时空立方体中的图像梯度和光流,以捕获动作的外观和运动模式方面的特征。基于这种描述符,用词袋法将人物形象描述为简洁的特征向量。然后,这些特征被用来在多个空间金字塔层次上训练SVM分类器,以区分不同的动作。最后,考虑动作的时间一致性,进行基于高斯核的时间滤波以分割来自视频流的事件序列。所提出的方法能够容忍由于复杂的场景中不同的视角和粗略的人物对准导致的人类动作的空间布局变化和局部变形。在TRECVid 2008事件检测任务的50小时视频数据集上进行的大量实验表明,我们的方法优于基于SIFT描述符的著名方法,并可以有效地在具有挑战性的现实条件下检测视频事件。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号