首页> 外文会议>International Conference on Computer Communication and the Internet >ViT-GAN: Using Vision Transformer as Discriminator with Adaptive Data Augmentation
【24h】

ViT-GAN: Using Vision Transformer as Discriminator with Adaptive Data Augmentation

机译:Vit-GaN:使用视觉变压器作为具有自适应数据增强的鉴别器

获取原文

摘要

These days, attention is thought to be an efficient way to recognize an image. Vision Transformer (ViT) uses a Transformer for images and has very high performance in image recognition. ViT has fewer parameters than Big Transfer (BiT) and Noisy Student. Therefore, we consider that Self-Attention-based networks are slimmer than convolution-based networks. We use a ViT as a Discriminator in a Generative Adversarial Network (GAN) to get the same performance with a smaller model. We name it ViT-GAN. Besides, we find parameter sharing is very useful to make parameter-efficient ViT. However, the performances of ViT heavily depend on the number of data samples. Therefore, we propose a new method of Data Augmentation. Our Data Augmentation, in which the strength of Data Augmentation varies adaptively, helps ViT for faster convergence and better performance. With our Data Augmentation, we show ViT-based discriminator can achieve almost the same FID but the number of the parameters of the discriminator is 35% fewer than the original discriminator.
机译:这些天,被认为是识别图像的有效方法。视觉变压器(VIT)使用变压器进行图像,在图像识别中具有很高的性能。 Vit的参数比大转移(位)和嘈杂的学生更少。因此,我们认为基于自我关注的网络比基于卷积的网络更纤薄。我们在生成的对抗性网络(GAN)中使用VIT作为鉴别器,以获得更小的模型的性能。我们称之为vit-gan。此外,我们发现参数共享非常有用,可以进行参数效率。然而,VIT的性能大量取决于数据样本的数量。因此,我们提出了一种新的数据增强方法。我们的数据增强,其中数据增强的强度适自变化,有助于更快的收敛性和更好的性能。通过我们的数据增强,我们展示了基于Vit的鉴别器可以实现几乎相同的FID,但鉴别器的参数的数量比原始鉴别器少35%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号