The present invention relates to a deep neural network learning method capable of extracting a visual characteristic necessary for autonomous motion of a mobile agent in an unsupervised learning manner by using only actual image and control signal data. The deep neural network learning method includes the steps of: (a) calculating a convolutional neural network (CNN) output value for each of a plurality of CNNs having input values of a current input image inputted to the mobile agent and a plurality of input images including at least one previous sequence input image; (b) calculating an LSTM output value for the current input image, wherein an LSTM output value for a convolution LSTM storing a CNN output value for the previous sequence input image immediately before the current input image as an input value and a CNN output value for the current input image are input values; and (c) generating a next predicted image predicted as a next input image inputted to the mobile agent through a spatial transformer networks (STN) that receives the LSTM output value for the current input image and a control signal of the mobile agent.;COPYRIGHT KIPO 2018
展开▼