This paper focuses on the problem of unsupervised image-to-image translation. More specifically, we aim at finding a translation network such that objects and shapes that only appear in the source domain are translated to objects and shapes only appearing in the target domain, while style color features present in the source domain remain the same. To achieve this, we use a domain-specific variational autoencoder and represent each image in its latent space representation. In a second step, we learn a translation between latent spaces of different domains using generative adversarial networks. We evaluate this framework on multiple datasets and verify the effect of multiple perceptual losses. Experiments on the MNIST and SVHN datasets show the effectiveness of the proposed translation method.
展开▼