首页> 外文期刊>Image and Vision Computing >Sparse B-spline polynomial descriptors for human activity recognition
【24h】

Sparse B-spline polynomial descriptors for human activity recognition

机译:用于人类活动识别的稀疏B样条多项式描述符

获取原文
获取原文并翻译 | 示例
           

摘要

The extraction and quantization of local image and video descriptors for the subsequent creation of visual codebooks is a technique that has proved very effective for image and video retrieval applications. In this paper we build on this concept and propose a new set of visual descriptors that provide a local space-time description of the visual activity. The proposed descriptors are extracted at spatiotemporal salient points detected on the estimated optical flow field for a given image sequence and are based on geometrical properties of three-dimensional piecewise polynomials, namely B-splines. The latter are fitted on the spatiotemporal locations of salient points that fall within a given spatiotemporal neighborhood. Our descriptors are invariant in translation and scaling in space-time. The latter is ensured by coupling the neighborhood dimensions to the scale at which the corresponding spatiotemporal salient points are detected. In addition, in order to provide robustness against camera motion (e.g. global translation due to camera panning) we subtract the motion component that is estimated by applying local median filters on the optical flow field. The descriptors that are extracted across the whole dataset are clustered in order to create a codebook of 'visual verbs', where each verb corresponds to a cluster center. We use the resulting codebook in a 'bag of verbs' approach in order to represent the motion of the subjects within small temporal windows. Finally, we use a boosting algorithm in order to select the most discriminative temporal windows of each class and Relevance Vector Machines (RVM) for classification. The presented results using three different databases of human actions verify the effectiveness of our method.
机译:本地图像和视频描述符的提取和量化以用于随后的可视码本创建是一项已被证明对图像和视频检索应用非常有效的技术。在本文中,我们基于此概念并提出了一组新的视觉描述符,它们提供了视觉活动的本地时空描述。在给定图像序列的估计光流场上检测到的时空显着点处提取提出的描述符,并基于三维分段多项式(即B样条)的几何特性。后者适合落在给定时空邻域内的显着点的时空位置。我们的描述符在时空的转换和缩放中是不变的。后者是通过将邻域维度耦合到检测相应时空显着点的尺度来确保的。另外,为了提供抵抗摄像机运动的鲁棒性(例如,由于摄像机摇摄引起的全局平移),我们减去通过在光流场上应用局部中值滤波器而估计的运动分量。跨整个数据集提取的描述符被聚类,以创建“视觉动词”的代码本,其中每个动词对应一个聚类中心。我们以“动词袋”方法使用生成的代码本,以表示主题在小时间窗口内的运动。最后,我们使用提升算法来选择每个类别中最具区别性的时间窗口,并使用相关矢量机(RVM)进行分类。使用三个不同的人类行为数据库给出的结果验证了我们方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号