首页> 外文会议>Conference on Computer and Robot Visio >A Multi-Scale Hierarchical Codebook Method for Human Action Recognition in Videos Using a Single Example
【24h】

A Multi-Scale Hierarchical Codebook Method for Human Action Recognition in Videos Using a Single Example

机译:使用单一示例的视频中的人为行动识别的多尺度分层码本方法

获取原文

摘要

This paper presents a novel action matching method based on a hierarchical codebook of local spatio-temporal video volumes (STVs). Given a single example of an activity as a query video, the proposed method finds similar videos to the query in a video dataset. It is based on the bag of video words (BOV) representation and does not require prior knowledge about actions, background subtraction, motion estimation or tracking. It is also robust to spatial and temporal scale changes, as well as some deformations. The hierarchical algorithm yields a compact subset of salient code words of STVs for the query video, and then the likelihood of similarity between the query video and all STVs in the target video is measured using a probabilistic inference mechanism. This hierarchy is achieved by initially constructing a codebook of STVs, while considering the uncertainty in the codebook construction, which is always ignored in current versions of the BOV approach. At the second level of the hierarchy, a large contextual region containing many STVs (Ensemble of STVs) is considered in order to construct a probabilistic model of STVs and their spatio-temporal compositions. At the third level of the hierarchy a codebook is formed for the ensembles of STVs based on their contextual similarities. The latter are the proposed labels (code words) for the actions being exhibited in the video. Finally, at the highest level of the hierarchy, the salient labels for the actions are selected by analyzing the high level code words assigned to each image pixel as a function of time. The algorithm was applied to three available video datasets for action recognition with different complexities (KTH, Weizmann, and MSR II) and the results were superior to other approaches, especially in the cases of a single training example and cross-dataset action recognition.
机译:本文介绍了一种基于本地时空视频卷(STV)的分层码本的新型动作匹配方法。给定作为查询视频的活动的单个示例,所提出的方法在视频数据集中的查询中找到类似的视频。它基于视频单词(BOV)表示的袋子,并且不需要先验知识,以及关于动作,背景减法,运动估计或跟踪的知识。它对空间和时间尺度变化以及一些变形也是强大的。分层算法产生了查询视频的STV的显着码字的紧凑型子集,然后使用概率推断机制测量查询视频与目标视频中的所有STV之间的相似性的可能性。通过最初构建STV的码本,在考虑码本构造的不确定性的同时实现了该层次结构,这总是在BOV方法的当前版本中忽略。在层次结构的第二级,考虑包含许多STV(STVS的集合)的大型上下文区域,以构建STV和它们的时空组合物的概率模型。在层次结构的第三级,基于其上下文相似度,形成了用于STV的合奏的码本。后者是在视频中展出的行动的建议标签(代码词)。最后,在层次结构的最高级别,通过分析为时间的函数分配给每个图像像素的高级代码单词来选择用于动作的突出标签。该算法应用于三个可用视频数据集具有不同的复杂性(KTH,魏兹曼和MSR II),结果动作识别优于其它方法,尤其是在一个单一的训练示例和跨数据集动作识别的情况。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号