首页> 外文会议>International Conference on Informatics, Electronics and Vision;International Conference on Imaging, Vision and Pattern Recognition >PerceptionGAN: Real-world Image Construction from Provided Text through Perceptual Understanding
【24h】

PerceptionGAN: Real-world Image Construction from Provided Text through Perceptual Understanding

机译:感知生成:通过感知理解,从提供的文本提供的真实形象建设

获取原文

摘要

Generating an image from a provided descriptive text is quite a challenging task because of the difficulty in incorporating perceptual information (object shapes, colors, and their interactions) along with providing high relevancy related to the provided text. Current methods first generate an initial low-resolution image, which typically has irregular object shapes, colors, and interaction between objects. This initial image is then improved by conditioning on the text. However, these methods mainly address the problem of using text representation efficiently in the refinement of the initially generated image, while the success of this refinement process depends heavily on the quality of the initially generated image, as pointed out in the Dynamic Memory Generative Adversarial Network (DM-GAN) paper. Hence, we propose a method to provide good initialized images by incorporating perceptual understanding in the discriminator module. We improve the perceptual information at the first stage itself, which results in significant improvement in the final generated image. In this paper, we have applied our approach to the novel StackGAN architecture. We then show that the perceptual information included in the initial image is improved while modeling image distribution at multiple stages. Finally, we generated realistic multi-colored images conditioned by text. These images have good quality along with containing improved basic perceptual information. More importantly, the proposed method can be integrated into the pipeline of other state-of-the-art text-based-image-generation models such as DM-GAN and AttnGAN to generate initial low-resolution images. We also worked on improving the refinement process in StackGAN by augmenting the third stage of the generator-discriminator pair in the StackGAN architecture. Our experimental analysis and comparison with the state-of-the-art on a large but sparse dataset MS COCO further validate the usefulness of our proposed approach. Contribution-This paper improves the pipeline for text to image generation by incorporating perceptual understanding in the initial stage of image generation.
机译:从提供的描述性文本生成图像是非常具有挑战性的任务,因为难以结合感知信息(对象形状,颜色及其交互)以及提供与提供的文本相关的高相关性。电流方法首先生成初始低分辨率图像,该初始低分辨率图像通常具有不规则的物体形状,颜色和对象之间的交互。然后通过在文本上调节该初始图像。然而,这些方法主要解决了在最初生成的图像的改进中有效地使用文本表示的问题,而该细化过程的成功则大量取决于最初生成的图像的质量,如动态存储器生成的对策网络中所指出的(DM-GAN)纸。因此,我们提出了一种通过在鉴别器模块中结合感知理解来提供良好的初始化图像的方法。我们在第一阶段本身提高了感知信息,这导致最终生成的图像的显着改善。在本文中,我们将我们的方法应用于新颖的Stackgan架构。然后,我们表明,在多个阶段建模图像分布时,包括在初始图像中包括的感知信息。最后,我们生成了由文本调节的现实多彩色图像。这些图像具有良好的质量以及包含改进的基本感知信息。更重要的是,可以将所提出的方法集成到其他基于最先进的文本的图像生成模型的流水线中,例如DM-GaN和Attngan,以产生初始低分辨率图像。我们还通过在StackGan架构中增强发电机鉴别器对的第三阶段来改善StackGan的改进过程。我们的实验分析和与最先进的大型但稀疏数据集MS Coco的实验分析和比较进一步验证了我们所提出的方法的有用性。贡献 - 本文通过在图像生成的初始阶段结合着感知理解,改善了文本到图像生成的管道。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号