首页> 外文期刊>Neurocomputing >Coupling Adversarial Graph Embedding for transductive zero-shot action recognition
【24h】

Coupling Adversarial Graph Embedding for transductive zero-shot action recognition

机译:耦合对抗性图形嵌入用于转换零射击动作识别

获取原文
获取原文并翻译 | 示例

摘要

Zero-shot action recognition (ZSAR) aims to recognize novel actions that have not been seen in the training stage. However, ZSAR always suffers from serious domain shift problem, which causes poor performance. This is because: 1) Videos contain complicated intrinsic structures, including cross-sample visual correlations and cross-category semantic relationships, which make it challenging to generalize domain shift over categories and transfer knowledge across videos. 2) Existing methods do not disentangle unique and shared information underlying unseen videos during embedding. They are always weakly adaptive to novel categories and easily shift unseen videos to irrelevant action prototypes. In this paper, we propose a novel Coupling Adversarial Graph Embedding (CAGE) method for ZSAR, which formulates an effective visual-to-semantic embedding to alleviate the domain shift problem. Our model implements in a transductive setting that assumes accessing to a full set of unseen videos. Firstly, a structured graph is built for expressing both seen and unseen videos, which integrally captures visual and semantic relationships between them. Then, an effective visual-to-semantic embedding is formulated based on graph convolutional network (GCN), which is generalized to disjoint action categories and optimized for label propagation. In addition, a couple of adversarial constraints are proposed to characterize unique information of unseen videos and purify shared information across categories, which further improve the adaptability and discriminability of our model. Experiments on Olympic sports, HMDB51 and UCF101 datasets show that our model achieves impressive performance on ZSAR task. (C) 2021 Elsevier B.V. All rights reserved.
机译:零射击动作识别(ZSAR)旨在识别训练阶段未见的新动作。然而,ZSAR总是遭受严重的域移位问题,这会导致性能不佳。这是因为:1)视频包含复杂的内在结构,包括交叉样本视觉相关和跨类语义关系,这使得概括到概括域转移类别并跨越视频传输知识。 2)现有方法在嵌入期间,不脱离唯一和共享信息的独特和共享信息。它们始终易于自适应小型类别,并且很容易将解密视频转移到无关的动作原型。在本文中,我们提出了一种新的ZSAR耦合对手图嵌入(笼子)方法,其为缓解域移位问题的有效的视觉上嵌入。我们的模型在转换设置中实现,假设访问完整的未经调查。首先,构建了一个结构化图形,用于表达看到和看不见的视频,这一体地捕获它们之间的视觉和语义关系。然后,基于图形卷积网络(GCN)制定了有效的视息嵌入,这是广泛地脱节动作类别并针对标签传播进行了优化。此外,提出了几种对手约束来表征未经调查的独特信息,并跨类别净化共享信息,从而进一步提高了我们模型的适应性和可辨别性。奥林匹克运动的实验,HMDB51和UCF101数据集显示我们的模型在ZSAR任务上实现了令人印象深刻的性能。 (c)2021 elestvier b.v.保留所有权利。

著录项

  • 来源
    《Neurocomputing》 |2021年第10期|239-252|共14页
  • 作者单位

    Beijing Jiaotong Univ Beijing Key Lab Traff Data Anal & Min Beijing Peoples R China;

    Beijing Jiaotong Univ Beijing Key Lab Traff Data Anal & Min Beijing Peoples R China;

    Beijing Jiaotong Univ Inst Informat Sci Beijing Peoples R China;

    Rochester Inst Technol B Thomas Golisano Coll Comp & Informat Sci Rochester NY 14623 USA;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Zero-shot learning; Action recognition; Graph convolutional network; Generative adversarial network;

    机译:零射击学习;动作识别;图形卷积网络;生成的对抗网络;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号