首页> 外文会议>IEEE International Conference on Image Processing >Semantic embedding space for zero-shot action recognition
【24h】

Semantic embedding space for zero-shot action recognition

机译:零动作识别的语义嵌入空间

获取原文

摘要

The number of categories for action recognition is growing rapidly. It is thus becoming increasingly hard to collect sufficient training data to learn conventional models for each category. This issue may be ameliorated by the increasingly popular “zero-shot learning” (ZSL) paradigm. In this framework a mapping is constructed between visual features and a human interpretable semantic description of each category, allowing categories to be recognised in the absence of any training data. Existing ZSL studies focus primarily on image data, and attribute-based semantic representations. In this paper, we address zero-shot recognition in contemporary video action recognition tasks, using semantic word vector space as the common space to embed videos and category labels. This is more challenging because the mapping between the semantic space and space-time features of videos containing complex actions is more complex and harder to learn. We demonstrate that a simple self-training and data augmentation strategy can significantly improve the efficacy of this mapping. Experiments on human action datasets including HMDB51 and UCF101 demonstrate that our approach achieves the state-of-the-art zero-shot action recognition performance.
机译:动作识别的类别数量正在迅速增长。因此,收集足够的训练数据以学习每种类别的常规模型变得越来越困难。日益流行的“零镜头学习”(ZSL)范例可能会改善此问题。在该框架中,在视觉特征和每个类别的人类可解释语义描述之间构建了映射,从而允许在没有任何训练数据的情况下识别类别。现有的ZSL研究主要集中在图像数据和基于属性的语义表示上。在本文中,我们使用语义词向量空间作为嵌入视频和类别标签的公共空间,来解决当代视频动作识别任务中的零镜头识别。这更具挑战性,因为包含复杂动作的视频的语义空间和时空特征之间的映射更加复杂且更难学习。我们证明了一种简单的自我训练和数据增强策略可以显着提高这种映射的功效。在包括HMDB51和UCF101在内的人类动作数据集上进行的实验表明,我们的方法实现了最新的零击动作识别性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号