This study applies a neuro-dynamic programming (NDP) to acquire policies generating continuous and precise controls for robots. A numerical simulation shows that the NDP achives the continuous action, the better control, and the shorter learning period in comparison with a general Q-learning. Moreover, the NDP is applicable to control a robot whose dynamics is changed discontinuously by constraints of environment.
展开▼