In this paper, we propose a Profit Sharing that can learn deterministic policy for POMDPs environments. The proposed method can obtain the deterministic policy by using the history of observations. In the proposed method, the states in the perceptual aliasing are detected. Here, the perceptual aliasing means that different states can be perceived as the same. In the states in the perceptual aliasing, the action is selected based on the history of observations. In order to use the history of observations in the action selection, the rules of observation sequences and their values are defined. In the proposed method, the deterministic policy can be learn finally by considering the history of observations if needed. We carried out a series of computer experiments, and confirmed that the proposed method can detect the states in perceptual aliasing in the POMDPs environment, and can obtain the deterministic policy using the values for the rules of observation sequences.
展开▼