Fusion by synthesizing: A multi-view deep neural network for zero-shot recognition

Xu Xing; Zhou Xiang; Shen Fumin; Gao Lianli; Shen Heng Tao; Li Xuelong

首页> 外文期刊>Signal processing >Fusion by synthesizing: A multi-view deep neural network for zero-shot recognition

【24h】

Fusion by synthesizing: A multi-view deep neural network for zero-shot recognition

机译：通过合成进行融合：用于零镜头识别的多视图深度神经网络

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Zero-shot learning (ZSL) aims to recognize objects without seeing any visual instances by learning knowledge transfer between seen and unseen classes. Attributes, which denote high-level visual entities or visual characteristics, have been widely utilized as intermediate embedding space for knowledge transfer in a majority of existing ZSL approaches and have shown impressive performance. Attribute based ZSL approaches, which introduce an intermediate embedding space of attributes for knowledge transfer, have shown impressive performance. However, providing attribute annotations for unseen classes at test time is time-consuming and labor-intensive. Besides, directly using attributes as intermediation (embedding space) for knowledge transfer and zero-shot prediction inevitably leads to the projection domain shift and hubness problems. In this paper, we propose a novel multi-view deep neural network, termed Fusion by Synthesis (FS), which leverages word embeddings of classes as complementary for attributes and performs zero-shot prediction by fusing the word embeddings of unseen classes and the synthesized attributes in the visual feature space. Specifically, in the training phase, by considering the visual features, attributes and word embeddings as three different views of visual instances, FS allocates each view with a denoising auto-encoder to simultaneously ensures robust view-specific reconstructions and cross-view synthesizing, while preserving the discrimination of class labels. During testing, FS can synthesize the absent attributes for unseen classes and fuse them with word embeddings to the visual feature space to perform zero-shot prediction. Besides, FS is flexible to learn with partial views, where either attribute view or word embedding view is missing during training. Moreover, FS is flexible to synthesize either the missing view of attributes or word embeddings from the provided view. Extensive experiments on six benchmark datasets on both image classification and action recognition show that FS is advantaged to fuse multi-view data by synthesis and achieves superior performance compared with the state-of-the-art ZSL methods. (C) 2019 Elsevier B.V. All rights reserved.

机译：零镜头学习（ZSL）旨在通过学习可见与不可见类之间的知识转移来识别对象而不会看到任何视觉实例。表示高级视觉实体或视觉特征的属性已在大多数现有的ZSL方法中被广泛用作知识传递的中间嵌入空间，并表现出令人印象深刻的性能。基于属性的ZSL方法（引入了中间的属性嵌入空间以进行知识传递）已显示出令人印象深刻的性能。但是，在测试时为看不见的类提供属性注释既费时又费力。此外，直接使用属性作为中介（嵌入空间）进行知识转移和零射击预测不可避免地会导致投影域移动和中心度问题。在本文中，我们提出了一种新颖的多视图深度神经网络，称为综合融合（FS），该网络利用类别的词嵌入作为属性的补充，并通过融合看不见的类别的词嵌入和合成的词来执行零击预测视觉特征空间中的属性。具体来说，在训练阶段，通过将视觉特征，属性和词嵌入视为视觉实例的三个不同视图，FS会为每个视图分配一个降噪自动编码器，以同时确保针对特定视图的健壮重建和跨视图合成，同时保留对类别标签的歧视。在测试过程中，FS可以为看不见的类综合缺少的属性，并将它们与单词嵌入融合到视觉特征空间以执行零击预测。此外，FS可以灵活地使用局部视图进行学习，在训练过程中缺少属性视图或单词嵌入视图。此外，FS可以灵活地从提供的视图中综合缺失的属性视图或词嵌入。对六个基准数据集进行图像分类和动作识别的大量实验表明，与最新的ZSL方法相比，FS具有通过合成融合多视图数据的优势，并具有出色的性能。（C）2019 Elsevier B.V.保留所有权利。

著录项

来源
《Signal processing》 |2019年第11期|354-367|共14页
作者
Xu Xing; Zhou Xiang; Shen Fumin; Gao Lianli; Shen Heng Tao; Li Xuelong;
展开▼
作者单位

Univ Elect Sci & Technol China Ctr Future Media Chengdu Sichuan Peoples R China|Univ Elect Sci & Technol China Sch Comp Sci & Engn Chengdu Sichuan Peoples R China;

Northwestern Polytech Univ Sch Comp Sci Xian Shaanxi Peoples R China|Northwestern Polytech Univ Ctr OPT IMagery Anal & Learning OPTIMAL Xian Shaanxi Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Zero-shot learning; Multi-view learning; Embedding space fusion; Synthesis;

机译：零镜头学习;多视图学习;嵌入空间融合;合成;

相似文献

外文文献
中文文献
专利

1. Fusion by synthesizing: A multi-view deep neural network for zero-shot recognition [J] . Xu Xing, Zhou Xiang, Shen Fumin, Signal processing . 2019,第NOVa期

机译：通过合成进行融合：用于零镜头识别的多视图深度神经网络
2. Multi-view face recognition using deep neural networks [J] . Feng Zhao, Jing Li, Lu Zhang, Future generation computer systems . 2020,第Octa期

机译：使用深神经网络的多视图人脸识别
3. A Multi-View Gait Recognition Method Using Deep Convolutional Neural Network and Channel Attention Mechanism [J] . Wang Jiabin, Peng Kai Computer Modeling in Engineering & Sciences . 2020,第1期

机译：利用深卷积神经网络和渠道注意机制的多视图步态识别方法
4. Channel-Wise Mix-Fusion Deep Neural Networks for Zero-Shot Learning [C] . Guowei Wang, Naiyang Guau, Hanjia Ye, IEEE International Conference on Acoustics, Speech and Signal Processing . 2021

机译：频道 - 明智的混合融合深度神经网络，用于零射击学习
5. Dysarthric Speech Recognition and Offline Handwriting Recognition using Deep Neural Networks. [D] . Pillai, Suhas Balkrishna. 2017

机译：使用深度神经网络的表情异常语音识别和离线手写识别。
6. 3D Point Cloud Recognition Based on a Multi-View Convolutional Neural Network [O] . Le Zhang, Jian Sun, Qiang Zheng 2018

机译：基于多视图卷积神经网络的3D点云识别
7. Development of a Face Recognition System Using Deep Convolutional Neural Network in a Multi-view Vision Environment [O] . Jay Robert B. Del Rosario 2019

机译：在多视觉视觉环境中使用深卷积神经网络的面部识别系统的开发

Fusion by synthesizing: A multi-view deep neural network for zero-shot recognition

摘要

著录项

相似文献

相关主题

期刊订阅