PerceptionGAN: Real-world Image Construction from Provided Text through Perceptual Understanding

机译：感知生成：通过感知理解，从提供的文本提供的真实形象建设

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Generating an image from a provided descriptive text is quite a challenging task because of the difficulty in incorporating perceptual information (object shapes, colors, and their interactions) along with providing high relevancy related to the provided text. Current methods first generate an initial low-resolution image, which typically has irregular object shapes, colors, and interaction between objects. This initial image is then improved by conditioning on the text. However, these methods mainly address the problem of using text representation efficiently in the refinement of the initially generated image, while the success of this refinement process depends heavily on the quality of the initially generated image, as pointed out in the Dynamic Memory Generative Adversarial Network (DM-GAN) paper. Hence, we propose a method to provide good initialized images by incorporating perceptual understanding in the discriminator module. We improve the perceptual information at the first stage itself, which results in significant improvement in the final generated image. In this paper, we have applied our approach to the novel StackGAN architecture. We then show that the perceptual information included in the initial image is improved while modeling image distribution at multiple stages. Finally, we generated realistic multi-colored images conditioned by text. These images have good quality along with containing improved basic perceptual information. More importantly, the proposed method can be integrated into the pipeline of other state-of-the-art text-based-image-generation models such as DM-GAN and AttnGAN to generate initial low-resolution images. We also worked on improving the refinement process in StackGAN by augmenting the third stage of the generator-discriminator pair in the StackGAN architecture. Our experimental analysis and comparison with the state-of-the-art on a large but sparse dataset MS COCO further validate the usefulness of our proposed approach. Contribution-This paper improves the pipeline for text to image generation by incorporating perceptual understanding in the initial stage of image generation.

机译：从提供的描述性文本生成图像是非常具有挑战性的任务，因为难以结合感知信息（对象形状，颜色及其交互）以及提供与提供的文本相关的高相关性。电流方法首先生成初始低分辨率图像，该初始低分辨率图像通常具有不规则的物体形状，颜色和对象之间的交互。然后通过在文本上调节该初始图像。然而，这些方法主要解决了在最初生成的图像的改进中有效地使用文本表示的问题，而该细化过程的成功则大量取决于最初生成的图像的质量，如动态存储器生成的对策网络中所指出的（DM-GAN）纸。因此，我们提出了一种通过在鉴别器模块中结合感知理解来提供良好的初始化图像的方法。我们在第一阶段本身提高了感知信息，这导致最终生成的图像的显着改善。在本文中，我们将我们的方法应用于新颖的Stackgan架构。然后，我们表明，在多个阶段建模图像分布时，包括在初始图像中包括的感知信息。最后，我们生成了由文本调节的现实多彩色图像。这些图像具有良好的质量以及包含改进的基本感知信息。更重要的是，可以将所提出的方法集成到其他基于最先进的文本的图像生成模型的流水线中，例如DM-GaN和Attngan，以产生初始低分辨率图像。我们还通过在StackGan架构中增强发电机鉴别器对的第三阶段来改善StackGan的改进过程。我们的实验分析和与最先进的大型但稀疏数据集MS Coco的实验分析和比较进一步验证了我们所提出的方法的有用性。贡献 - 本文通过在图像生成的初始阶段结合着感知理解，改善了文本到图像生成的管道。

著录项

来源
《International Conference on Informatics, Electronics and Vision;International Conference on Imaging, Vision and Pattern Recognition》|2020年|1-7|共7页
会议地点
作者
Kanish Garg; Ajeet Kumar Singh; Dorien Herremans; Brejesh Lall;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Gallium nitride; Shape; Image synthesis; Training; Image color analysis; Generative adversarial networks; Generators;

机译：氮化镓;形状;图像合成;训练;图像颜色分析;生成的对抗网络;发电机;

相似文献

外文文献
中文文献
专利

1. Editorial introduction to the special issue on 'Image Understanding for Real-World Distributed Video Networks' - Computer Vision and Image Understanding Journal [J] . Bir Bhanu, Andrea Prati, Faisal Qureshi Computer vision and image understanding . 2015,第may期

机译：“现实世界的分布式视频网络的图像理解”特刊的编辑介绍-《计算机视觉与图像理解杂志》
2. Perceptual Ability with Real-World Nighttime Scenes: Image-Intensified, Infrared, and Fused-Color Imagery [J] . Edward A. Essock, Michael J. Sinai, Jason S. McCarley Human Factors . 1999,第3期

机译：现实世界夜间场景的感知能力：图像增强，红外和融合彩色图像
3. Robust Localization of Texts in Real-World Images [J] . Ghanei Shaho, Faez Karim International Journal of Pattern Recognition and Artificial Intelligence . 2015,第7期

机译：真实图像中文本的稳健本地化
4. Labeling-based knowledge construction for real-world event understanding [C] . IEEE International Conference on Cognitive Informatics . 2009

机译：基于标签的真实信息的知识构建
5. Representing and Inferring Visual Perceptual Skills in Dermatological Image Understanding [D] . Li, Rui 2013

机译：在皮肤图像理解中表示和推断视觉感知技能
6. Local image statistics: maximum-entropy constructions and perceptual salience [O] . Jonathan D. Victor, Mary M. Conte -1

机译：本地图像统计：最大熵结构和感知凸显
7. PerceptionGAN: Real-world Image Construction from Provided Text through Perceptual Understanding [O] . Kanish Garg, Ajeet Kumar Singh, Dorien Herremans, 2020

机译：感知生成：通过感知理解，从提供的文本提供的真实形象建设
8. PROTEUS (PROtotype TExt Understanding System) and PUNDIT (Prolog UNDerstander of Integrated Text): Research in Text Understanding at the Department of Computer Science, New York University and System Development Corporation--A Bur [R] . Grishman, R., Hirschman, L. 1986

机译：pROTEUs（pROtotype TExt理解系统）和pUNDIT（综合文本的prolog UNDerstander）：纽约大学计算机科学系和系统开发公司的文本理解研究 - Bur

PerceptionGAN: Real-world Image Construction from Provided Text through Perceptual Understanding

摘要

著录项

相似文献

相关主题

期刊订阅