首页> 外文期刊>IEEE transactions on multimedia >CKD: Cross-Task Knowledge Distillation for Text-to-Image Synthesis
【24h】

CKD: Cross-Task Knowledge Distillation for Text-to-Image Synthesis

机译:CKD:文本到图像合成的交叉任务知识蒸馏

获取原文
获取原文并翻译 | 示例

摘要

Text-to-image synthesis (T2IS) has drawn increasing interest recently, which can automatically generate images conditioned on text descriptions. It is a highly challenging task that learns a mapping from a semantic space of text description to a complex RGB pixel space of image. The main issues of T2IS lie in two aspects: semantic consistency and visual quality. The distributions between text descriptions and image contents are inconsistent since they belong to different modalities. So it is ambitious to generate images containing consistent semantic contents with the text descriptions, which is the semantic consistency issue. Moreover, due to the discrepancy of data distributions between real and synthetic images in huge pixel space, it is hard to approximate the real data distribution for synthesizing photo-realistic images, which is the visual quality issue. For addressing the above issues, we propose a cross-task knowledge distillation (CKD) approach to transfer knowledge from multiple image semantic understanding tasks into T2IS task. There is amount of knowledge in image semantic understanding tasks to translate image contents into semantic representation, which is advantageous to address the issues of semantic consistency and visual quality for T2IS. Moreover, we design a multi-stage knowledge distillation paradigm to decompose the distillation process into multiple stages. By this paradigm, it is effective to approximate the distributions of real image and understand textual information for T2IS, which can improve the visual quality and semantic consistency of synthetic images. Comprehensive experiments on widely-used datasets show the effectiveness of our proposed CKD approach.
机译:最近文本到图像综合(T2IS)绘制了越来越多的利益,可以自动生成在文本描述上的图像。它是一个高度挑战的任务,它学习从文本描述的语义空间到图像的复杂RGB像素空间的映射。 T2的主要问题是两个方面:语义一致性和视觉质量。文本描述和图像内容之间的分布不一致,因为它们属于不同的模态。因此,它雄心勃勃地生成包含一致语义内容的图像与文本描述,这是语义一致性问题。此外,由于巨大像素空间中的实际和合成图像之间的数据分布差异,很难近似于合成照片逼真图像的真实数据分布,这是视觉质量问题。为了解决上述问题,我们提出了一个跨任务知识蒸馏(CKD)方法,以将知识从多个图像语义理解任务转移到T2IS任务中。图像语义理解任务中存在知识量,以将图像内容转化为语义表示,这是有利于解决T2IS的语义一致性和视觉质量的问题。此外,我们设计了一种多阶段知识蒸馏范例,以将蒸馏过程分解为多个阶段。通过该范例,近似真实图像的分布和理解T2IS的文本信息是有效的,这可以提高合成图像的视觉质量和语义一致性。广泛使用的数据集的综合实验表明了我们提出的CKD方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号