Taking a photo outside, can we predict the immediate future, extit{e.g.},how would the cloud move in the sky? We address this problem by presenting agenerative adversarial network (GAN) based two-stage approach to generatingrealistic time-lapse videos of high resolution. Given the first frame, ourmodel learns to generate long-term future frames. The first stage generatesvideos of realistic contents for each frame. The second stage refines thegenerated video from the first stage by enforcing it to be closer to realvideos with regard to motion dynamics. To further encourage vivid motion in thefinal generated video, Gram matrix is employed to model the motion moreprecisely. We build a large scale time-lapse dataset, and test our approach onthis new dataset. Using our model, we are able to generate realistic videos ofup to $128imes 128$ resolution for 32 frames. Quantitative and qualitativeexperiment results have demonstrated the superiority of our model over thestate-of-the-art models.
展开▼