...
首页> 外文期刊>Pattern Recognition: The Journal of the Pattern Recognition Society >Lightweight dynamic conditional GAN with pyramid attention for text-to-image synthesis
【24h】

Lightweight dynamic conditional GAN with pyramid attention for text-to-image synthesis

机译:轻量级动态条件GaN与金字塔关注文本 - 图像综合

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

The text-to-image synthesis task aims to generate photographic images conditioned on semantic text de-scriptions. To ensure the sharpness and fidelity of generated images, this task tends to generate high resolution images (e.g., 128(2) or 256(2) ). However, as the resolution increases, the network parameters and complexity increases dramatically. Recent works introduce network structures with extensive parameters and heavy computations to guarantee the production of high-resolution images. As a result, these models come across problems of the unstable training process and high training cost. To tackle these issues, in this paper, we propose an effective information compensation based approach, namely Lightweight Dynamic Conditional GAN (LD-CGAN). LD-CGAN is a compact and structured single-stream network, and it consists of one generator and two independent discriminators to regularize and generate 64(2) and 128(2) images in one feed-forward process. Specifically, the generator of LD-CGAN is composed of three major components: (1) Conditional Embedding (CE), which is an automatically unsupervised learning process aiming at disentangling integrated semantic attributes in the text space; (2) Conditional Manipulating Modular (CM-M) in Conditional Manipulating Block (CM-B), which is designed to continuously provide the image features with the compensation information (i.e., the disentangled attribute); and (3) Pyramid Attention Refine Block (PAR-B), which is used to enrich multi-scale features by capturing spatial importance between multi-scale context. Consequently, experiments conducted under two benchmark datasets, CUB and Oxford-102, indicate that our generated 128(2) images can achieve comparable performance with 256(2) images generated by the state-of-the-arts on two evaluation metrics: Inception Score (IS) and Visual-semantic Similarity (VS). Compared with the current state-of-the-art HDGAN, our LD-CGAN significantly decreases the number of parameters and computation time by 86.8% and 94.9%, respectively. (c) 2020 Elsevier Ltd. All rights reserved.
机译:文本到图像合成任务旨在生成基于语义文本描述的照片图像。为了确保生成的图像的清晰度和保真度,此任务倾向于生成高分辨率图像(例如128(2)或256(2))。然而,随着分辨率的增加,网络参数和复杂性急剧增加。最近的工作引入了具有广泛参数和繁重计算的网络结构,以保证生成高分辨率图像。因此,这些模型遇到了训练过程不稳定和训练成本高的问题。为了解决这些问题,本文提出了一种有效的基于信息补偿的方法,即轻量级动态条件GAN(LD-CGAN)。LD-CGAN是一种结构紧凑的单流网络,它由一个生成器和两个独立的鉴别器组成,在一个前馈过程中正则化并生成64(2)和128(2)幅图像。具体而言,LD-CGAN的生成器由三个主要部分组成:(1)条件嵌入(CE),这是一个自动无监督学习过程,旨在解开文本空间中的综合语义属性;(2) 条件操作块(CM-B)中的条件操作模块(CM-M),其设计用于连续向图像特征提供补偿信息(即解纠缠属性);(3)金字塔注意细化块(PAR-B),通过捕捉多尺度背景之间的空间重要性来丰富多尺度特征。因此,在两个基准数据集CUB和Oxford-102下进行的实验表明,我们生成的128(2)幅图像可以在两个评估指标(初始分数(IS)和视觉语义相似性(VS)上与最新技术生成的256(2)幅图像取得相当的性能。与目前最先进的HDGAN相比,我们的LD-CGAN显著减少了86.8%的参数数量和94.9%的计算时间。(c) 2020爱思唯尔有限公司版权所有。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号