首页> 外文会议>ACM international conference on Multimedia >Detecting video events based on action recognition in complex scenes using spatio-temporal descriptor

【24h】

Detecting video events based on action recognition in complex scenes using spatio-temporal descriptor

机译：使用时空描述符在复杂场景中的动作识别检测视频事件

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Event detection plays an essential role in video content analysis and remains a challenging open problem. In particular, the study on detecting human-related video events in complex scenes with both a crowd of people and dynamic motion is still limited. In this paper, we investigate detecting video events that involve elementary human actions, e.g. making cellphone call, putting an object down, and pointing to something, in complex scenes using a novel spatio-temporal descriptor based approach. A new spatio-temporal descriptor, which temporally integrates the statistics of a set of response maps of low-level features, e.g. image gradients and optical flows, in a space-time cube, is proposed to capture the characteristics of actions in terms of their appearance and motion patterns. Based on this kind of descriptors, the bag-of-words method is utilized to describe a human figure as a concise feature vector. Then, these features are employed to train SVM classifiers at multiple spatial pyramid levels to distinguish different actions. Finally, a Gaussian kernel based temporal filtering is conducted to segment the sequences of events from a video stream taking account of the temporal consistency of actions. The proposed approach is capable of tolerating spatial layout variations and local deformations of human actions due to diverse view angles and rough human figure alignment in complex scenes. Extensive experiments on the 50-hour video dataset of TRECVid 2008 event detection task demonstrate that our approach outperforms the well-known SIFT descriptor based methods and effectively detects video events in challenging real-world conditions.

机译：事件检测在视频内容分析中起着重要作用，仍然是一个具有挑战性的公开问题。特别是，在与人群和动态运动中检测复杂场景中的人类相关视频事件的研究仍然有限。在本文中，我们调查了涉及基本人类行为的检测视频事件，例如，涉及基本的人类行动。通过基于新的时空描述符的方法，在复杂的场景中将对象放下并指向某物，并指向某事物。新的时空描述符，其在时间上集成了低级功能集的一组响应图的统计数据。在时空立方体中，提出了在时空立方体中的图像梯度和光学流动，以捕获其外观和运动模式的动作特征。基于这种描述符，使用袋式方法来描述作为简洁特征向量的人类图。然后，这些特征用于在多个空间金字塔水平下训练SVM分类器以区分不同的动作。最后，进行了基于高斯内核的时间滤波，以从考虑行动的时间一致性，将事件的序列分段。所提出的方法能够承受由于在复杂场景中的不同视角和粗糙的人体图形对准而导致人类动作的空间布局变化和局部变形。在Trecvid 2008事件检测任务的50小时视频数据集上进行了广泛的实验，表明我们的方法优于基于众所周知的SIFT描述符的方法，并有效地检测在具有挑战性的真实条件下的视频事件。

著录项

来源
《ACM international conference on Multimedia》|2009年||共10页
会议地点
作者
Guangyu Zhu; Ming Yang; Kai Yu; Wei Xu; Yihong Gong;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类多媒体技术与多媒体计算机;
关键词
action recognition; event detection; motion representation; semantic analysis;

机译：行动识别;事件检测;运动表示;语义分析;

相似文献

外文文献
中文文献
专利

1. Supervised spatio-temporal kernel descriptor for human action recognition from RGB-depth videos [J] . Asadi-Aghbolaghi Maryam, Kasaei Shohreh Multimedia Tools and Applications . 2018,第11期

机译：受监督的时空内核描述符，用于从RGB深度视频中识别人类动作
2. Histogram of Fuzzy Local Spatio-Temporal Descriptors for Video Action Recognition [J] . Zuo Zheming, Yang Longzhi, Liu Yonghuai, IEEE transactions on industrial informatics . 2020,第6期

机译：用于视频动作识别的模糊本地时空描述符的直方图
3. Human action recognition based on spatio-temporal three-dimensional scattering transform descriptor and an improved VLAD feature encoding algorithm [J] . Lin Bo, Fang Bin, Yang Weibin, Neurocomputing . 2019,第JULa5期

机译：基于时空三维散射变换描述符和改进的VLAD特征编码算法的人体动作识别
4. Detecting video events based on action recognition in complex scenes using spatio-temporal descriptor [C] . Guangyu Zhu, Ming Yang, Kai Yu, ACM international conference on Multimedia . 2009

机译：使用时空描述符在复杂场景中基于动作识别检测视频事件
5. Analyzing complex events and human actions in "in-the-wild" videos. [D] . Lee, Hyungtae. 2014

机译：分析“狂野”视频中的复杂事件和人类行为。
6. Spatio-Temporal Attention Model for Foreground Detection in Cross-Scene Surveillance Videos [O] . Dong Liang, Jiaxing Pan, Han Sun, 2019

机译：跨场景监控视频中前景检测的时空注意模型
7. Content-based video retrieval by integrating spatio-temporal and stochastic recognition of events [O] . M. Petkovic, W. Jonker 2013

机译：通过整合事件的时空和随机识别来进行基于内容的视频检索
8. Video Segmentation Descriptors for Event Recognition. [R] . Trichet, R., Nevatia, R. 2014

机译：用于事件识别的视频分段描述符。

Detecting video events based on action recognition in complex scenes using spatio-temporal descriptor

摘要

著录项

相似文献

相关主题

期刊订阅