首页> 美国卫生研究院文献>Sensors (Basel Switzerland) >Spatio-Temporal Action Detection in Untrimmed Videos by Using Multimodal Features and Region Proposals
【2h】

Spatio-Temporal Action Detection in Untrimmed Videos by Using Multimodal Features and Region Proposals

机译:利用多峰特征和区域提议检测未修剪视频中的时空行为

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

This paper proposes a novel deep neural network model for solving the spatio-temporal-action-detection problem, by localizing all multiple-action regions and classifying the corresponding actions in an untrimmed video. The proposed model uses a spatio-temporal region proposal method to effectively detect multiple-action regions. First, in the temporal region proposal, anchor boxes were generated by targeting regions expected to potentially contain actions. Unlike the conventional temporal region proposal methods, the proposed method uses a complementary two-stage method to effectively detect the temporal regions of the respective actions occurring asynchronously. In addition, to detect a principal agent performing an action among the people appearing in a video, the spatial region proposal process was used. Further, coarse-level features contain comprehensive information of the whole video and have been frequently used in conventional action-detection studies. However, they cannot provide detailed information of each person performing an action in a video. In order to overcome the limitation of coarse-level features, the proposed model additionally learns fine-level features from the proposed action tubes in the video. Various experiments conducted using the LIRIS-HARL and UCF-10 datasets confirm the high performance and effectiveness of the proposed deep neural network model.
机译:本文提出了一种新颖的深度神经网络模型,通过定位所有多个动作区域并对未修剪视频中的相应动作进行分类来解决时空动作检测问题。所提出的模型使用时空区域提议方法来有效地检测多动作区域。首先,在临时区域提议中,锚定框是通过定位预期可能包含动作的区域生成的。与常规的时间区域提议方法不同,该提议的方法使用互补的两阶段方法来有效地检测异步发生的各个动作的时间区域。另外,为了检测在视频中出现的人中执行动作的委托人,使用了空间区域提议处理。此外,粗略特征包含整个视频的全面信息,并已在常规的动作检测研究中频繁使用。但是,他们无法提供每个人在视频中执行操作的详细信息。为了克服粗糙特征的限制,提出的模型还从视频中提出的动作管中学习了精细特征。使用LIRIS-HARL和UCF-10数据集进行的各种实验证实了所提出的深度神经网络模型的高性能和有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号