首页> 外文期刊>Signal processing >Fusion by synthesizing: A multi-view deep neural network for zero-shot recognition
【24h】

Fusion by synthesizing: A multi-view deep neural network for zero-shot recognition

机译:通过合成进行融合:用于零镜头识别的多视图深度神经网络

获取原文
获取原文并翻译 | 示例
           

摘要

Zero-shot learning (ZSL) aims to recognize objects without seeing any visual instances by learning knowledge transfer between seen and unseen classes. Attributes, which denote high-level visual entities or visual characteristics, have been widely utilized as intermediate embedding space for knowledge transfer in a majority of existing ZSL approaches and have shown impressive performance. Attribute based ZSL approaches, which introduce an intermediate embedding space of attributes for knowledge transfer, have shown impressive performance. However, providing attribute annotations for unseen classes at test time is time-consuming and labor-intensive. Besides, directly using attributes as intermediation (embedding space) for knowledge transfer and zero-shot prediction inevitably leads to the projection domain shift and hubness problems. In this paper, we propose a novel multi-view deep neural network, termed Fusion by Synthesis (FS), which leverages word embeddings of classes as complementary for attributes and performs zero-shot prediction by fusing the word embeddings of unseen classes and the synthesized attributes in the visual feature space. Specifically, in the training phase, by considering the visual features, attributes and word embeddings as three different views of visual instances, FS allocates each view with a denoising auto-encoder to simultaneously ensures robust view-specific reconstructions and cross-view synthesizing, while preserving the discrimination of class labels. During testing, FS can synthesize the absent attributes for unseen classes and fuse them with word embeddings to the visual feature space to perform zero-shot prediction. Besides, FS is flexible to learn with partial views, where either attribute view or word embedding view is missing during training. Moreover, FS is flexible to synthesize either the missing view of attributes or word embeddings from the provided view. Extensive experiments on six benchmark datasets on both image classification and action recognition show that FS is advantaged to fuse multi-view data by synthesis and achieves superior performance compared with the state-of-the-art ZSL methods. (C) 2019 Elsevier B.V. All rights reserved.
机译:零镜头学习(ZSL)旨在通过学习可见与不可见类之间的知识转移来识别对象而不会看到任何视觉实例。表示高级视觉实体或视觉特征的属性已在大多数现有的ZSL方法中被广泛用作知识传递的中间嵌入空间,并表现出令人印象深刻的性能。基于属性的ZSL方法(引入了中间的属性嵌入空间以进行知识传递)已显示出令人印象深刻的性能。但是,在测试时为看不见的类提供属性注释既费时又费力。此外,直接使用属性作为中介(嵌入空间)进行知识转移和零射击预测不可避免地会导致投影域移动和中心度问题。在本文中,我们提出了一种新颖的多视图深度神经网络,称为综合融合(FS),该网络利用类别的词嵌入作为属性的补充,并通过融合看不见的类别的词嵌入和合成的词来执行零击预测视觉特征空间中的属性。具体来说,在训练阶段,通过将视觉特征,属性和词嵌入视为视觉实例的三个不同视图,FS会为每个视图分配一个降噪自动编码器,以同时确保针对特定视图的健壮重建和跨视图合成,同时保留对类别标签的歧视。在测试过程中,FS可以为看不见的类综合缺少的属性,并将它们与单词嵌入融合到视觉特征空间以执行零击预测。此外,FS可以灵活地使用局部视图进行学习,在训练过程中缺少属性视图或单词嵌入视图。此外,FS可以灵活地从提供的视图中综合缺失的属性视图或词嵌入。对六个基准数据集进行图像分类和动作识别的大量实验表明,与最新的ZSL方法相比,FS具有通过合成融合多视图数据的优势,并具有出色的性能。 (C)2019 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号