Transformer models for enhancing AttnGAN based text to image generation

Naveen S.; Kiran M. S. S. Ram; Indupriya M.; Manikanta T. V.; Sudeep P. V.

首页> 外文期刊>Image and Vision Computing >Transformer models for enhancing AttnGAN based text to image generation

【24h】

Transformer models for enhancing AttnGAN based text to image generation

机译：将基于ATTNGAN的文本增强到图像生成的变压器模型

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Deep neural networks are capable of producing photographic images that depict given natural language text descriptions. Such models have huge potential in applications such as interior designing, video games, editing and facial sketching for digital forensics. However, only a limited number of methods in the literature have been developed for text to image (TTI) generation. Most of them use Generative Adversarial Networks (GAN) based deep learning methods. Attentional GAN (AttnGAN) is a popular GAN based TTI method that extracts meaningful information from the given text descriptions using attention mechanism. In this paper, we investigate the use of different Transformer models such as BERT, GPT2, XLNet with AttnGAN to solve the challenge of extracting semantic information from the text descriptions. Hence, the proposed AttnGAN(TRANS) architecture has three variants AttnGAN(BERT), AttnGAN(XL) and AttnGAN(GPT). The proposed method is successful over the conventional AttnGAN and gives a boosted inception score by 27.23% and a decline of Frechet inception distance by 49.9%. The results in our experiments indicate that the proposed method has the potential to outperform the contemporary state-of-the-art methods and validate the use of Transformer models in improving the performance of TTI generation. The code is made publicly available at https://github.com/sairamkiran9/AttnGAN-trans. (C) 2021 Elsevier B.V. All rights reserved.

机译：深度神经网络能够产生描绘给定自然语言文本描述的摄影图像。这些模型具有巨大的应用潜力，例如室内设计，视频游戏，编辑和用于数字取证的面部素描。然而，只有有限数量的文献中的方法是为图像（TTI）生成的文本而开发的。其中大多数都使用基于生成的对抗网络（GaN）的深度学习方法。注意力甘（Attngan）是一种基于GAN的流行GaN方法，可以使用注意机制从给定的文本描述中提取有意义的信息。在本文中，我们调查了不同变压器模型，如BERT，GPT2，XLNET与ATTNGAN来解决从文本描述中提取语义信息的挑战。因此，所提出的Attngan（Trans）建筑有三个变体Attngan（BERT），Attngan（XL）和Attngan（GPT）。该方法在传统的Attngan方面取得了成功，并提高了27.23％的增强成立得分，而Freechet成立距离下降49.9％。我们的实验中的结果表明，该方法有可能优势优于现代最先进的方法，并验证变压器模型在提高TTI生成的性能方面的使用。该代码在HTTPS://github.com/sairamkiran9/ATTNAGAN-TRANS上公开提供。（c）2021 elestvier b.v.保留所有权利。

著录项

来源
《Image and Vision Computing》 |2021年第11期|104284.1-104284.10|共10页
作者
Naveen S.; Kiran M. S. S. Ram; Indupriya M.; Manikanta T. V.; Sudeep P. V.;
展开▼
作者单位

Natl Inst Technol Calicut Calicut 673601 Kerala India;

Natl Inst Technol Calicut Calicut 673601 Kerala India;

Natl Inst Technol Calicut Calicut 673601 Kerala India;

Natl Inst Technol Calicut Calicut 673601 Kerala India;

Natl Inst Technol Calicut Calicut 673601 Kerala India;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Generative Adversarial Networks (GANs); Natural Language Processing (NLP); Text to image synthesis; Transformers; Attention mechanism;

机译：生成的对抗网络（GANS）;自然语言处理（NLP）;文本到图像合成;变形金刚;注意机制;

相似文献

外文文献
中文文献
专利

1. Application of an emotional classification model in e-commerce text based on an improved transformer model [J] . Xuyang Wang, Yixuan Tong PLoS One . 2021,第3期

机译：基于改进变压器模型的电子商务文本中的情感分类模型在电子商务文本中的应用
2. Text Document Categorization using Enhanced Sentence Vector Space Model and Bi-Gram Text Representation Model Based on Novel Fusion Techniques [J] . Abdisa Demissie Amensisa New Media and Mass Communication . 2020,第4期

机译：基于新型融合技术的基于增强句子矢量空间模型和双革文本表示模型的文本文档分类
3. Rate-Dependent Modeling and src="/images/tex/32325.gif" alt="text{H}_{mathrm {infty }}"> Robust Control of GMA Based on Hammerstein Model With Preisach Operator [J] . Guo Yongxin, Mao Jianqin, Zhou Kemin Control Systems Technology, IEEE Transactions on . 2015,第6期

机译：基于速率的建模和 src =“ / images / tex / 32325.gif” alt =“ text {H} _ {mathrm {infty}}”> 对GMA的鲁棒控制基于Hammerstein模型的Preisach算子
4. AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks [C] . Tao Xu, Pengchuan Zhang, Qiuyuan Huang, IEEE/CVF Conference on Computer Vision and Pattern Recognition . 2018

机译：AttnGAN：细化文本到带有注意生成对抗网络的图像生成
5. Statistical models for text query-based image retrieval [D] . Feng, Shaolei 2008

机译：基于文本查询的图像检索的统计模型
6. Application of an emotional classification model in e-commerce text based on an improved transformer model [O] . Xuyang Wang, Yixuan Tong 2021

机译：一种情绪分类模型在基于改进变压器模型的电子商务文本中的应用
7. AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks [O] . Tao Xu, Pengchuan Zhang, Qiuyuan Huang, 2018

机译：Attngan：用注意力生成对抗网络的图像生成细粒度文本
8. Enhancement of Modeling and Image Generation Techniques for High-Resolution 3-Dimensional Imagery [R] . Joy, K. I. 1988

机译：增强高分辨率三维图像的建模和图像生成技术

Transformer models for enhancing AttnGAN based text to image generation

摘要

著录项

相似文献

相关主题

期刊订阅