Disturbance rejection is one of the most important abilities required for biped walkers. In this study, we propose a method for dynamic programming of biped walking and apply it to a simple passive dynamic walker (PDW) on an irregular slope. The key of the proposed approach is to employ the transient dynamics of the walker just before approaching the falling state in the absence of any controlling input, and to derive the optimal control policy in the low-dimensional latent space. In recent our study, we found that such transient dynamics deeply relates to the basin of attraction for a stable gait. By patching latent coordinates to such a structures in each Poincaré section and defining the reward function according to the survive time of the transient dynamics, so-called escape-times, we construct a Markov decision process (MDP) for the PDW and obtain an optimal policy using a dynamic programming (DP). We will show that the proposed method actually succeeds in controlling the PDW even if the degree of disturbance is relatively large and the dimensionality of coordinates is reduced to lower ones.
展开▼