首页> 外文会议>International conference on neural information processing;ICONIP 2010 >Extraction of Reward-Related Feature Space Using Correlation-Based and Reward-Based Learning Methods
【24h】

Extraction of Reward-Related Feature Space Using Correlation-Based and Reward-Based Learning Methods

机译:基于关联和基于奖励的学习方法提取与奖励相关的特征空间

获取原文

摘要

The purpose of this article is to present a novel learning paradigm that extracts reward-related low-dimensional state space by combining correlation-based learning like Input Correlation Learning (ICO learning) and reward-based learning like Reinforcement Learning (RL). Since ICO learning can quickly find a correlation between a state and an unwanted condition (e.g., failure), we use it to extract low-dimensional feature space in which we can find a failure avoidance policy. Then, the extracted feature space is used as a prior for RL. If we can extract proper feature space for a given task, a model of the policy can be simple and the policy can be easily improved. The performance of this learning paradigm is evaluated through simulation of a cart-pole system. As a result, we show that the proposed method can enhance the feature extraction process to find the proper feature space for a pole balancing policy. That is it allows a policy to effectively stabilize the pole in the largest domain of initial conditions compared to only using ICO learning or only using RL without any prior knowledge.
机译:本文的目的是提出一种新颖的学习范例,该方法通过将基于相关的学习(例如输入相关学习(ICO学习))和基于奖励的学习(例如强化学习)相结合来提取与奖励相关的低维状态空间。由于ICO学习可以快速找到状态与不想要的条件(例如故障)之间的相关性,因此我们使用它来提取低维特征空间,在该空间中我们可以找到避免故障的策略。然后,将提取的特征空间用作RL的先验。如果我们可以为给定任务提取适当的特征空间,则策略的模型可以很简单,并且可以轻松地改进策略。这种学习范式的性能是通过模拟磁极系统来评估的。结果表明,所提出的方法可以增强特征提取过程,从而为极点平衡策略找到合适的特征空间。也就是说,与仅使用ICO学习或仅使用RL而无需任何先验知识相比,它可以使策略在初始条件的最大范围内有效地稳定极点。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号