首页> 外文会议>ACM international conference on Multimedia >Detecting video events based on action recognition in complex scenes using spatio-temporal descriptor

【24h】

Detecting video events based on action recognition in complex scenes using spatio-temporal descriptor

机译：使用时空描述符在复杂场景中基于动作识别检测视频事件

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Event detection plays an essential role in video content analysis and remains a challenging open problem. In particular, the study on detecting human-related video events in complex scenes with both a crowd of people and dynamic motion is still limited. In this paper, we investigate detecting video events that involve elementary human actions, e.g. making cellphone call, putting an object down, and pointing to something, in complex scenes using a novel spatio-temporal descriptor based approach. A new spatio-temporal descriptor, which temporally integrates the statistics of a set of response maps of low-level features, e.g. image gradients and optical flows, in a space-time cube, is proposed to capture the characteristics of actions in terms of their appearance and motion patterns. Based on this kind of descriptors, the bag-of-words method is utilized to describe a human figure as a concise feature vector. Then, these features are employed to train SVM classifiers at multiple spatial pyramid levels to distinguish different actions. Finally, a Gaussian kernel based temporal filtering is conducted to segment the sequences of events from a video stream taking account of the temporal consistency of actions. The proposed approach is capable of tolerating spatial layout variations and local deformations of human actions due to diverse view angles and rough human figure alignment in complex scenes. Extensive experiments on the 50-hour video dataset of TRECVid 2008 event detection task demonstrate that our approach outperforms the well-known SIFT descriptor based methods and effectively detects video events in challenging real-world conditions.

机译：事件检测在视频内容分析中起着至关重要的作用，并且仍然是一个具有挑战性的开放性问题。尤其是，在人群和动态运动双方都很复杂的场景中检测与人类有关的视频事件的研究仍然很有限。在本文中，我们调查了检测涉及基本人类动作的视频事件，例如使用新颖的基于时空描述符的方法在复杂的场景中拨打电话，放下物体并指向某物。一个新的时空描述符，它在时间上整合了一组低级特征（例如，特征）的响应图的统计信息。提出了在时空立方体中的图像梯度和光流，以捕获动作的外观和运动模式方面的特征。基于这种描述符，用词袋法将人物形象描述为简洁的特征向量。然后，这些特征被用来在多个空间金字塔层次上训练SVM分类器，以区分不同的动作。最后，考虑动作的时间一致性，进行基于高斯核的时间滤波以分割来自视频流的事件序列。所提出的方法能够容忍由于复杂的场景中不同的视角和粗略的人物对准导致的人类动作的空间布局变化和局部变形。在TRECVid 2008事件检测任务的50小时视频数据集上进行的大量实验表明，我们的方法优于基于SIFT描述符的著名方法，并可以有效地在具有挑战性的现实条件下检测视频事件。

著录项

来源
《ACM international conference on Multimedia》|2009年|P.165 - 174|共10页
会议地点
作者
Guangyu Zhu; Ming Yang; Kai Yu; Wei Xu; Yihong Gong;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类多媒体技术与多媒体计算机;
关键词
action recognition; event detection; motion representation; semantic analysis;

机译：动作识别;事件检测;动作表示;语义分析;

相似文献

外文文献
中文文献
专利

1. Supervised spatio-temporal kernel descriptor for human action recognition from RGB-depth videos [J] . Asadi-Aghbolaghi Maryam, Kasaei Shohreh Multimedia Tools and Applications . 2018,第11期

机译：受监督的时空内核描述符，用于从RGB深度视频中识别人类动作
2. Histogram of Fuzzy Local Spatio-Temporal Descriptors for Video Action Recognition [J] . Zuo Zheming, Yang Longzhi, Liu Yonghuai, IEEE transactions on industrial informatics . 2020,第6期

机译：用于视频动作识别的模糊本地时空描述符的直方图
3. Human action recognition based on spatio-temporal three-dimensional scattering transform descriptor and an improved VLAD feature encoding algorithm [J] . Lin Bo, Fang Bin, Yang Weibin, Neurocomputing . 2019,第JULa5期

机译：基于时空三维散射变换描述符和改进的VLAD特征编码算法的人体动作识别
4. Detecting video events based on action recognition in complex scenes using spatio-temporal descriptor [C] . Guangyu Zhu, Ming Yang, Kai Yu, ACM international conference on Multimedia . 2009

机译：使用时空描述符在复杂场景中的动作识别检测视频事件
5. Analyzing complex events and human actions in "in-the-wild" videos. [D] . Lee, Hyungtae. 2014

机译：分析“狂野”视频中的复杂事件和人类行为。
6. Spatio-Temporal Attention Model for Foreground Detection in Cross-Scene Surveillance Videos [O] . Dong Liang, Jiaxing Pan, Han Sun, 2019

机译：跨场景监控视频中前景检测的时空注意模型
7. Content-based video retrieval by integrating spatio-temporal and stochastic recognition of events [O] . M. Petkovic, W. Jonker 2013

机译：通过整合事件的时空和随机识别来进行基于内容的视频检索
8. Video Segmentation Descriptors for Event Recognition. [R] . Trichet, R., Nevatia, R. 2014

机译：用于事件识别的视频分段描述符。

Detecting video events based on action recognition in complex scenes using spatio-temporal descriptor

摘要

著录项

相似文献

相关主题

期刊订阅