Lightweight dynamic conditional GAN with pyramid attention for text-to-image synthesis

Gao Lianli; Chen Daiyuan; Zhao Zhou; Shao Jie; Shen Heng Tao

首页> 外文期刊>Pattern Recognition: The Journal of the Pattern Recognition Society >Lightweight dynamic conditional GAN with pyramid attention for text-to-image synthesis

【24h】

Lightweight dynamic conditional GAN with pyramid attention for text-to-image synthesis

机译：轻量级动态条件GaN与金字塔关注文本 - 图像综合

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The text-to-image synthesis task aims to generate photographic images conditioned on semantic text de-scriptions. To ensure the sharpness and fidelity of generated images, this task tends to generate high resolution images (e.g., 128(2) or 256(2) ). However, as the resolution increases, the network parameters and complexity increases dramatically. Recent works introduce network structures with extensive parameters and heavy computations to guarantee the production of high-resolution images. As a result, these models come across problems of the unstable training process and high training cost. To tackle these issues, in this paper, we propose an effective information compensation based approach, namely Lightweight Dynamic Conditional GAN (LD-CGAN). LD-CGAN is a compact and structured single-stream network, and it consists of one generator and two independent discriminators to regularize and generate 64(2) and 128(2) images in one feed-forward process. Specifically, the generator of LD-CGAN is composed of three major components: (1) Conditional Embedding (CE), which is an automatically unsupervised learning process aiming at disentangling integrated semantic attributes in the text space; (2) Conditional Manipulating Modular (CM-M) in Conditional Manipulating Block (CM-B), which is designed to continuously provide the image features with the compensation information (i.e., the disentangled attribute); and (3) Pyramid Attention Refine Block (PAR-B), which is used to enrich multi-scale features by capturing spatial importance between multi-scale context. Consequently, experiments conducted under two benchmark datasets, CUB and Oxford-102, indicate that our generated 128(2) images can achieve comparable performance with 256(2) images generated by the state-of-the-arts on two evaluation metrics: Inception Score (IS) and Visual-semantic Similarity (VS). Compared with the current state-of-the-art HDGAN, our LD-CGAN significantly decreases the number of parameters and computation time by 86.8% and 94.9%, respectively. (c) 2020 Elsevier Ltd. All rights reserved.

机译：文本到图像合成任务旨在生成基于语义文本描述的照片图像。为了确保生成的图像的清晰度和保真度，此任务倾向于生成高分辨率图像（例如128（2）或256（2））。然而，随着分辨率的增加，网络参数和复杂性急剧增加。最近的工作引入了具有广泛参数和繁重计算的网络结构，以保证生成高分辨率图像。因此，这些模型遇到了训练过程不稳定和训练成本高的问题。为了解决这些问题，本文提出了一种有效的基于信息补偿的方法，即轻量级动态条件GAN（LD-CGAN）。LD-CGAN是一种结构紧凑的单流网络，它由一个生成器和两个独立的鉴别器组成，在一个前馈过程中正则化并生成64（2）和128（2）幅图像。具体而言，LD-CGAN的生成器由三个主要部分组成：（1）条件嵌入（CE），这是一个自动无监督学习过程，旨在解开文本空间中的综合语义属性；（2）条件操作块（CM-B）中的条件操作模块（CM-M），其设计用于连续向图像特征提供补偿信息（即解纠缠属性）；（3）金字塔注意细化块（PAR-B），通过捕捉多尺度背景之间的空间重要性来丰富多尺度特征。因此，在两个基准数据集CUB和Oxford-102下进行的实验表明，我们生成的128（2）幅图像可以在两个评估指标（初始分数（IS）和视觉语义相似性（VS）上与最新技术生成的256（2）幅图像取得相当的性能。与目前最先进的HDGAN相比，我们的LD-CGAN显著减少了86.8%的参数数量和94.9%的计算时间。（c） 2020爱思唯尔有限公司版权所有。

著录项

来源
《Pattern Recognition: The Journal of the Pattern Recognition Society》 |2021年第1期|共11页
作者
Gao Lianli; Chen Daiyuan; Zhao Zhou; Shao Jie; Shen Heng Tao;
展开▼
作者单位

Univ Elect Sci &

Technol China Dept Comp Sci Chengdu 611731 Peoples R China;

Univ Elect Sci &

Technol China Dept Comp Sci Chengdu 611731 Peoples R China;

Zhejiang Univ Sch Comp Sci Hangzhou Peoples R China;

Univ Elect Sci &

Technol China Dept Comp Sci Chengdu 611731 Peoples R China;

Univ Elect Sci &

Technol China Dept Comp Sci Chengdu 611731 Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
Text-to-image synthesis; Conditional generative adversarial network (CGAN); Network complexity; Disentanglement process; Entanglement process; Information compensation; Pyramid attentive fusion;

机译：文本到图像合成;有条件的生成对策网络（CGAN）;网络复杂性;解剖过程;纠缠过程;信息补偿;金字塔细心融合;

相似文献

外文文献
中文文献
专利

1. A Comprehensive Pipeline for Complex Text-to-Image Synthesis [J] . Fei Fang, Fei Luo, Hong-Pan Zhang, 计算机科学技术学报（英文版） . 2020,第003期
2. Influence of adatom migration on wrinkling morphologies of AlGaN/GaN micro-pyramids grown by selective MOVPE [J] . Jie Chen, Pu-Man Huang, Xiao-Biao Han, 中国物理：英文版 . 2017,第006期
3. PCCM-GAN: Photographic Text-to-image Generation with Pyramid Contrastive Consistency Model [J] . Zhongjian Q., Sun Jun, Qian Jinzhao, Neurocomputing . 2021,第Auga18期

机译：PCCM-GaN：带金字塔对比一致性模型的摄影文本到图像生成
4. KT-GAN: Knowledge-Transfer Generative Adversarial Network for Text-to-Image Synthesis [J] . Hongchen Tan, Xiuping Liu, Meng Liu, IEEE Transactions on Image Processing . 2021,第1期

机译：KT-GaN：知识转移生成对抗网络，用于文本到图像合成
5. MRP-GAN: Multi-resolution parallel generative adversarial networks for text-to-image synthesis [J] . Qi Zhongjian, Fan Chaogang, Xu Liangfeng, Pattern recognition letters . 2021,第Jula期

机译：MRP-GaN：用于文本到图像合成的多分辨率并行生成对抗网络
6. CcGL-GAN: Criss-Cross Attention and Global-Local Discriminator Generative Adversarial Networks for text-to-image synthesis [C] . Xihong Ye, Lu Lu International Joint Conference on Neural Networks . 2021

机译：CcGL-GAN：用于文本到图像合成的交叉关注和全局-局部鉴别器生成对抗网络
7. Generating fMRI Images of Neural Activations Using Ranked Voxel Mask and Volumetric Conditional Gan [D] . Chitroda, Divyesh. 2020

机译：使用排名的体素掩模和体积条件GaN产生神经激活的FMRI图像
8. Pyramid Inter-Attention for High Dynamic Range Imaging [O] . Sungil Choi, Jaehoon Cho, Wonil Song, 2020

机译：金字塔间关注高动态范围成像
9. SC-GAN: 3D self-attention conditional GAN with spectral normalization for multi-modal neuroimaging synthesis [O] . Haoyu Lan, Arthur W Toga, Farshid Sepehrband 2020

机译：SC-GaN：3D自我关注条件GaN，具有多模态神经影像合成的光谱归一化
10. Pyramidal Defects in GaN:Mg Grown with Ga Polarity [R] . Liliental-Weber, Z., Tomaszewicz, T., Zakharov, D., 2005

机译：GaN中的金字塔形缺陷：Ga极性生长的mg

Lightweight dynamic conditional GAN with pyramid attention for text-to-image synthesis

摘要

著录项

相似文献

相关主题

期刊订阅