Statistics of spatial-temporal concatenations of features at human fixations in action classification

xin chen; xiaoyuan zhu; weibing wan; zhiyong yang

首页> 外文期刊>Journal of vision >Statistics of spatial-temporal concatenations of features at human fixations in action classification

【24h】

Statistics of spatial-temporal concatenations of features at human fixations in action classification

机译：动作分类中人类注视点的特征的时空串联统计

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Humans can detect, recognize, and classify a range of actions quickly. What are the spatial-temporal features and computations that underlie this ability? Global representations such as spatial-temporal volumes can be highly informative, but depend on segmentation and tracking. Local representations such as histograms of optic flow lack descriptive power and require extensive training. Recently, we developed a model in which any human action is encoded by a spatial-temporal concatenation of natural action structures (NASs), i.e., sequences of structured patches in human actions at multiple spatial-temporal scales. We compiled NASs from videos of natural human actions, examined the statistics of NASs, and selected a set of NASs that are highly informative and used them as features for action classification. We found that the NASs obtained in this way achieved a significantly better recognition performance than simple spatial-temporal features. To examine to which extend this model accounts for human action understanding, we hypothesized that humans search for informative NASs in this task and performed visual psychophysical studies. We asked 12 subjects with normal vision to classify 500 videos of human actions while tracking their fixations with an EyeLink II eye tracker. We examined statistics of the NASs compiled at the recorded fixations and found that human observers' fixations were sparsely distributed and usually deployed to locations in space-time where concatenations of local features are informative. We selected a set of NASs compiled at the fixations and used them as features for action classification. We found that the classification accuracy is comparable to human performance and to that of the same model but with automatically selected NASs. We concluded that encoding natural human actions in terms of NASs and their spatial-temporal concatenations accounts for aspects of human action understanding.

机译：人类可以快速检测，识别和分类一系列动作。这种能力背后的时空特征和计算是什么？诸如时空体积之类的全局表示可以提供大量信息，但取决于分段和跟踪。诸如光流直方图的局部表示缺乏描述能力，需要进行大量培训。最近，我们开发了一种模型，其中任何人类动作都由自然动作结构（NAS）的时空串联编码，即人类动作在多个时空尺度上的结构化补丁序列。我们从人类自然动作的视频中汇编了NAS，检查了NAS的统计数据，并选择了一组具有丰富信息的NAS，并将其用作动作分类的功能。我们发现，以这种方式获得的NAS比简单的时空特征具有更好的识别性能。为了检查该模型对人类行为理解的解释范围，我们假设人类在此任务中搜索信息丰富的NAS，并进行了视觉心理物理研究。我们要求12位视力正常的受试者在使用EyeLink II眼动仪追踪其注视时，对500个人类动作视频进行分类。我们检查了在记录的注视点上汇编的NAS的统计信息，发现人类观察者的注视点分布稀疏，通常部署在时空中结合了当地特征的位置。我们选择了在固定点编译的一组NAS，并将它们用作动作分类的功能。我们发现分类准确度可与人类表现相媲美，并且与具有自动选择的NAS的相同型号的表现相当。我们得出的结论是，根据NAS及其时空级联对自然人类行为进行编码可解释人类对行为的理解。

著录项

来源
《Journal of vision》 |2013年第9期|共1页
作者
xin chen; xiaoyuan zhu; weibing wan; zhiyong yang;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类眼科学;
关键词
入库时间 2022-08-18 20:43:10

相似文献

外文文献
中文文献
专利

1. Video mining for facial action unit classification using statistical spatial-temporal feature image and LoG deep convolutional neural network [J] . Lifkooee Masoud Z., Soysal Omer M., Sekeroglu Kazim Machine Vision and Applications . 2019,第1期

机译：使用统计时空特征图像和LoG深度卷积神经网络的面部动作单元分类视频挖掘
2. Network traffic classification using deep convolutional recurrent autoencoder neural networks for spatial-temporal features extraction [J] . DAngelo Gianni, Palmieri Francesco Journal of network and computer applications . 2021,第Jana期

机译：网络流量分类使用深卷积复制自动化器神经网络进行空间时间特征提取
3. Deeply-learned and spatial-temporal feature engineering for human action understanding [J] . Hechuang Wang Future generation computer systems . 2021,第Octa期

机译：对人类行动理解的深受学习和空间颞态特征工程
4. Time-ordered spatial-temporal interest points for human action classification [C] . Mengyuan Liu, Chen Chen, Hong Liu IEEE International Conference on Multimedia and Expo . 2017

机译：用于人类动作分类的按时间顺序排列的时空兴趣点
5. Some Contributions in Complex Statistical Data Analysis for Classification, Feature Selection, and Human Fertility Models [D] . Pritchard, David A. 2020

机译：复杂统计数据分析中的一些贡献，用于分类，特征选择和人类生育模型
6. Robust Action Recognition Using Multi-Scale Spatial-Temporal Concatenations of Local Features as Natural Action Structures [O] . Xiaoyuan Zhu, Meng Li, Xiaojian Li, -1

机译：强有力的行动识别使用多尺度时空地域特色的自然作为行动结构级联
7. Robust action recognition using multi-scale spatial-temporal concatenations of local features as natural action structures. [O] . Xiaoyuan Zhu, Meng Li, Xiaojian Li, 2012

机译：使用局部特征的多尺度时空连接作为自然动作结构的鲁棒动作识别。

Statistics of spatial-temporal concatenations of features at human fixations in action classification

摘要

著录项

相似文献

相关主题

期刊订阅