Kernel dynamic policy programming: Applicable reinforcement learning to robot systems with high dimensional states

Yunduan Cui; Takamitsu Matsubara; Kenji Sugimoto

首页> 外文期刊>Neural Networks: The Official Journal of the International Neural Network Society >Kernel dynamic policy programming: Applicable reinforcement learning to robot systems with high dimensional states

【24h】

Kernel dynamic policy programming: Applicable reinforcement learning to robot systems with high dimensional states

机译：内核动态策略编程：适用于具有高维状态的机器人系统的适用加固

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Abstract We propose a new value function approach for model-free reinforcement learning in Markov decision processes involving high dimensional states that addresses the issues of brittleness and intractable computational complexity, therefore rendering the value function approach based reinforcement learning algorithms applicable to high dimensional systems. Our new algorithm, Kernel Dynamic Policy Programming (KDPP) smoothly updates the value function in accordance to the Kullback–Leibler divergence between current and updated policies. Stabilizing the learning in this manner enables the application of the kernel trick to value function approximation, which greatly reduces computational requirements for learning in high dimensional state spaces. The performance of KDPP against other kernel trick based value function approaches is first investigated in a simulated n DOF manipulator reaching task, where only KDPP efficiently learned a viable policy at n = 40 . As an application to a real world high dimensional robot system, KDPP successfully learned the task of unscrewing a bottle cap via a Pneumatic Artificial Muscle (PAM) driven robotic hand with tactile sensors; a system with a state space of 32 dimensions, while given limited samples and with ordinary computing resources.

机译：摘要我们提出了一种新的价值函数方法，用于在马尔可夫决策过程中进行模型函数方法，涉及解决脆性和难治性计算复杂性问题的高维状态，因此呈现了基于价值的增强学习算法适用于高维系统。我们的新算法，内核动态策略编程（KDPP）根据当前和更新的策略之间的Kullback-Leibler分歧平滑更新值函数。以这种方式稳定学习使得能够将内核技巧应用于价值函数近似，这大大降低了在高维状态空间中学习的计算要求。首先在模拟的N DOF机械手达到任务中研究了KDPP对其他内核特技价值函数方法的性能，其中仅KDPP有效地学习了n = 40的可行策略。作为现实世界的高维机器人系统的应用，KDPP通过气动人工肌肉（PAM）驱动的机器人手动与触觉传感器成功地学习了织造瓶盖的任务;具有32个维度的状态空间的系统，而给定有限的样本和普通计算资源。

著录项

来源
《Neural Networks: The Official Journal of the International Neural Network Society》 |2017年第2017期|共11页
作者
Yunduan Cui; Takamitsu Matsubara; Kenji Sugimoto;
展开▼
作者单位

Graduate School of Information Science Nara Institute of Science and Technology;

Graduate School of Information Science Nara Institute of Science and Technology;

Graduate School of Information Science Nara Institute of Science and Technology;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类神经病学;
关键词
Reinforcement learning; Kernel methods; Robot learning;

机译：加强学习;内核方法;机器人学习;

相似文献

外文文献
中文文献
专利

1. Kernel dynamic policy programming: Applicable reinforcement learning to robot systems with high dimensional states [J] . Yunduan Cui, Takamitsu Matsubara, Kenji Sugimoto Neural Networks: The Official Journal of the International Neural Network Society . 2017,第期

机译：内核动态策略编程：适用于具有高维状态的机器人系统的适用加固
2. Reinforcement Learning endowed with safe veto policies to learn the control of Linked-Multicomponent Robotic Systems [J] . Fernandez-Gauna Borja, Grana Manuel, Manuel Lopez-Guede Jose, Information Sciences: An International Journal . 2015,第Null期

机译：具有安全否决权政策的强化学习可学习链接多组件机器人系统的控制
3. GENETIC NETWORK PROGRAMMING-REINFORCEMENT LEARNING BASED SAFE AND SMOOTH MOBILE ROBOT NAVIGATION IN UNKNOWN DYNAMIC ENVIRONMENTS [J] . AHMED H. M. FINDI, MOHAMMAD H. MARHABAN, RAJA KAMIL, Journal of Theoretical and Applied Information Technology . 2017,第11期

机译：未知动态环境中基于遗传网络编程增强学习的安全和平滑移动机器人导航
4. Kernel dynamic policy programming: Practical reinforcement learning for high-dimensional robots [C] . Yunduan Cui, Takamitsu Matsubara, Kenji Sugimoto IEEE-RAS International Conference on Humanoid Robots . 2016

机译：内核动态策略编程：高维机器人的实用强化学习
5. Coping with the curse of dimensionality by combining linear programming and reinforcement learning. [D] . Burton, Scott H. 2010

机译：通过将线性规划和强化学习相结合，应对维度的诅咒。
6. Exploring Feature Dimensions to Learn a New Policy in an Uninformed Reinforcement Learning Task [O] . Oh-hyeon Choung, Sang Wan Lee, Yong Jeong -1

机译：探索功能维度以在不知情的强化学习任务中学习新策略
7. Robot reinforcement learning accuracy-based learning classifier systems with Fuzzy Policy Gradient descent(XCS-FPGRL) [O] . Jie Shao, Jingru Yu 2015

机译：基于机器人加强学习精确的基于学习分类器系统，具有模糊政策梯度下降（XCS-FPGR1）

Kernel dynamic policy programming: Applicable reinforcement learning to robot systems with high dimensional states

摘要

著录项

相似文献

相关主题

期刊订阅