In reinforcement learning, the learning algorithms frequently have to deal with both continuous state and continuous action spaces to control accurately. In this paper, the great capacity of kernel method for handling continuous state space problems and the advantage of actor-critic method in dealing with continuous action space problems are combined. Kernel-based continuous-action actor-critic learning ( KCACL ) is proposed grounded on the combination. In KCACL, the actor updates each action probability based on reward-inaction, and the critic updates the state value function according to online selective kernel-based temporal difference( OSKTD) learning. The experimental results demonstrate the effectiveness of the proposed algorithm.%强化学习算法通常要处理连续状态及连续动作空间问题以实现精确控制。就此文中结合Actor-Critic方法在处理连续动作空间的优点及核方法在处理连续状态空间的优势,提出一种基于核方法的连续动作Actor-Critic学习算法( KCACL)。该算法中,Actor根据奖赏不作为原则更新动作概率,Critic采用基于核方法的在线选择时间差分算法学习状态值函数。对比实验验证该算法的有效性。
展开▼