首页> 外文期刊>Neural Networks: The Official Journal of the International Neural Network Society >Kernel dynamic policy programming: Applicable reinforcement learning to robot systems with high dimensional states
【24h】

Kernel dynamic policy programming: Applicable reinforcement learning to robot systems with high dimensional states

机译:内核动态策略编程:适用于具有高维状态的机器人系统的适用加固

获取原文
获取原文并翻译 | 示例
           

摘要

Abstract We propose a new value function approach for model-free reinforcement learning in Markov decision processes involving high dimensional states that addresses the issues of brittleness and intractable computational complexity, therefore rendering the value function approach based reinforcement learning algorithms applicable to high dimensional systems. Our new algorithm, Kernel Dynamic Policy Programming (KDPP) smoothly updates the value function in accordance to the Kullback–Leibler divergence between current and updated policies. Stabilizing the learning in this manner enables the application of the kernel trick to value function approximation, which greatly reduces computational requirements for learning in high dimensional state spaces. The performance of KDPP against other kernel trick based value function approaches is first investigated in a simulated n DOF manipulator reaching task, where only KDPP efficiently learned a viable policy at n = 40 . As an application to a real world high dimensional robot system, KDPP successfully learned the task of unscrewing a bottle cap via a Pneumatic Artificial Muscle (PAM) driven robotic hand with tactile sensors; a system with a state space of 32 dimensions, while given limited samples and with ordinary computing resources.
机译:摘要我们提出了一种新的价值函数方法,用于在马尔可夫决策过程中进行模型函数方法,涉及解决脆性和难治性计算复杂性问题的高维状态,因此呈现了基于价值的增强学习算法适用于高维系统。我们的新算法,内核动态策略编程(KDPP)根据当前和更新的策略之间的Kullback-Leibler分歧平滑更新值函数。以这种方式稳定学习使得能够将内核技巧应用于价值函数近似,这大大降低了在高维状态空间中学习的计算要求。首先在模拟的N DOF机械手达到任务中研究了KDPP对其他内核特技价值函数方法的性能,其中仅KDPP有效地学习了n = 40的可行策略。作为现实世界的高维机器人系统的应用,KDPP通过气动人工肌肉(PAM)驱动的机器人手动与触觉传感器成功地学习了织造瓶盖的任务;具有32个维度的状态空间的系统,而给定有限的样本和普通计算资源。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号