Variational auto-encoder (VAE) is a powerful unsupervised learning frameworkfor image generation. One drawback of VAE is that it generates blurry imagesdue to its Gaussianity assumption and thus L2 loss. To allow the generation ofhigh quality images by VAE, we increase the capacity of decoder network byemploying residual blocks and skip connections, which also enable efficientoptimization. To overcome the limitation of L2 loss, we propose to generateimages in a multi-stage manner from coarse to fine. In the simplest case, theproposed multi-stage VAE divides the decoder into two components in which thesecond component generates refined images based on the course images generatedby the first component. Since the second component is independent of the VAEmodel, it can employ other loss functions beyond the L2 loss and differentmodel architectures. The proposed framework can be easily generalized tocontain more than two components. Experiment results on the MNIST and CelebAdatasets demonstrate that the proposed multi-stage VAE can generate sharperimages as compared to those from the original VAE.
展开▼