T-VLAD: Temporal vector of locally aggregated descriptor for multiview human action recognition

Naeem Hajra Binte; Murtaza Fiza; Yousaf Muhammad Haroon; Velastin Sergio A.

首页> 外文期刊>Pattern recognition letters >T-VLAD: Temporal vector of locally aggregated descriptor for multiview human action recognition

【24h】

T-VLAD: Temporal vector of locally aggregated descriptor for multiview human action recognition

机译：T-VLAD：多视图人体行动识别的局部聚合描述符的时间向量

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Robust view-invariant human action recognition (HAR) requires effective representation of its temporal structure in multi-view videos. This study explores a view-invariant action representation based on convolutional features. Action representation over long video segments is computationally expensive, whereas features in short video segments limit the temporal coverage locally. Previous methods are based on complex multi-stream deep convolutional feature maps extracted over short segments. To cope with this issue, a novel framework is proposed based on a temporal vector of locally aggregated descriptors (TVLAD). T-VLAD encodes long term temporal structure of the video employing single stream convolutional features over short segments. A standard VLAD vector size is a multiple of its feature codebook size (256 is normally recommended). VLAD is modified to incorporate time-order information of segments, where the T-VLAD vector size is a multiple of its smaller time-order codebook size. Previous methods have not been extensively validated for view-variation. Results are validated in a challenging setup, where one view is used for testing and the remaining views are used for training. State-of-the-art results have been obtained on three multi-view datasets with fixed cameras, IXMAS, MuHAVi and MCAD. Also, the proposed encoding approach T-VLAD works equally well on a dynamic background dataset, UCF101. (c) 2021 Elsevier B.V. All rights reserved.

机译：强大的视图 - 不变的人类行动识别（Har）需要有效地表示其在多视图视频中的时间结构。本研究探讨了基于卷积特征的视图 - 不变动作表示。长视频段的动作表示是计算昂贵的，而短视频段中的特征限制了本地的时间覆盖。以前的方法基于在短段中提取的复杂多流深卷积特征图。为了应对这个问题，基于当地聚合描述符（TVLAD）的时间向量提出了一种新颖的框架。 T-VLAD对使用短段的单流卷积特征的视频的长期时间结构进行编码。标准VLAD矢量大小是其特征码本尺寸的倍数（通常建议使用256）。修改VLAD以结合段的时间顺序信息，其中T-VLAD矢量大小是其较小的时序码本大小的倍数。以前的方法尚未广泛验证查看变化。结果在一个具有挑战性的设置中验证，其中一个视图用于测试，并且剩余视图用于培训。已经在具有固定摄像机，IXMAS，MUHAVI和MCAD的三个多视图数据集上获得最先进的结果。此外，建议的编码方法T-VLAD在动态背景数据集UCF101上同样良好工作。（c）2021 elestvier b.v.保留所有权利。

著录项

来源
《Pattern recognition letters》 |2021年第8期|22-28|共7页
作者
Naeem Hajra Binte; Murtaza Fiza; Yousaf Muhammad Haroon; Velastin Sergio A.;
展开▼
作者单位

Univ Engn & Technol Taxila Swarm Robot Lab NCRA Taxila 47050 Pakistan;

Inst Appl Sci & Technol PAF IAST Pak Austria Fachhsch Sino Pak Ctr Artificial Intelligence SPCAI Haripur Pakistan|Univ Engn & Technol Taxila Dept Comp Engn Taxila 47050 Pakistan;

Univ Engn & Technol Taxila Swarm Robot Lab NCRA Taxila 47050 Pakistan|Univ Engn & Technol Taxila Dept Comp Engn Taxila 47050 Pakistan;

Queen Mary Univ London Sch Elect Engn & Comp Sci London E1 4NS England;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Human action recognition; Multi-view; View-invariant; Temporal action sequence; VLAD; 3D Convolutional neural network features; IXMAS; Muhavi; UCF101; Short segment features;

机译：人类行动识别;多视图;查看不变;时间动作序列;VLAD;3D卷积神经网络特征;IXMAS;MUHAVI;UCF101;短段特征;

相似文献

外文文献
中文文献
专利

1. Aggregating the temporal coherent descriptors in videos using multiple learning kernel for action recognition [J] . Saleh Adel, Abdel-Nasser Mohamed, Angel Garcia Miguel, Pattern recognition letters . 2018,第APRa1期

机译：使用多个学习核对视频中的时间相干描述符进行聚合以进行动作识别
2. Robust human action recognition based on spatio-temporal descriptors and motion temporal templates [J] . Jianfang Dou, Jianxun Li Optik: Zeitschrift fur Licht- und Elektronenoptik: = Journal for Light-and Electronoptic . 2014,第7期

机译：基于时空描述符和运动时间模板的鲁棒性人类动作识别
3. Human Action Recognition from Multiple Views Based on View-Invariant Feature Descriptor Using Support Vector Machines [J] . Allah Bux Sargano, Plamen Angelov, Zulfiqar Habib Applied Sciences . 2016,第10期

机译：基于支持向量机的视图不变特征描述子的多视角人类动作识别
4. Human Action Recognition using Improved Vector of Locally Aggregated Descriptors [C] . Shi-Ping Yang, Jin-Jang Leou International conference on image processing, computer vision, pattern recognition . 2016

机译：使用改进的局部集合描述符向量的人类动作识别
5. Dynamic Descriptors in Human Gait Recognition. [D] . Amin, Tahir. 2013

机译：步态识别中的动态描述符。
6. Vehicle Detection in Aerial Images Using a Fast Oriented Region Search and the Vector of Locally Aggregated Descriptors [O] . Chongyang Liu, Yalin Ding, Ming Zhu, 2019

机译：使用快速定向区域搜索和局部聚集描述符向量的航空图像中的车辆检测
7. Global Flow and Temporal-Shape Descriptors for Human Action Recognition from 3D Reconstruction Data [O] . Georgios Th. Papadopoulos, Petros Daras 2017

机译：来自3D重建数据的人类行动识别的全局流动和颞形描述符

T-VLAD: Temporal vector of locally aggregated descriptor for multiview human action recognition

摘要

著录项

相似文献

相关主题

期刊订阅