An apparatus for performing continuous actions includes a memory storing instructions, and a processor configured to execute the instructions to obtain a first action of an agent, based on a current state of the agent, using a cross-entropy guided policy (CGP) neural network, and control to perform the obtained first action. The CGP neural network is trained using a cross-entropy method (CEM) policy neural network for obtaining a second action of the agent based on an input state of the agent, and the CEM policy neural network is trained using a CEM and trained separately from the training of the CGP neural network.
展开▼