In this paper, we propose a Q-Learning method with the plural Q-values concerning the maximization and the minimization of rewards and punishments. We aim at the realization of a system that learns complicated emotion behaviors with the behavior selection which has positiveness and negativeness adaptively according to the situation. Furthermore, we also report the result of an experiment by computer simulation to confirm the efficiency of the proposed method.
展开▼