首页> 外文会议>International Confernece on Neural Information Processing >Extraction of Reward-Related Feature Space Using Correlation-Based and Reward-Based Learning Methods
【24h】

Extraction of Reward-Related Feature Space Using Correlation-Based and Reward-Based Learning Methods

机译:使用基于相关的基于奖励的学习方法提取奖励相关的特征空间

获取原文

摘要

The purpose of this article is to present a novel learning paradigm that extracts reward-related low-dimensional state space by combining correlation-based learning like Input Correlation Learning (ICO learning) and reward-based learning like Reinforcement Learning (RL). Since ICO learning can quickly find a correlation between a state and an unwanted condition (e.g., failure), we use it to extract low-dimensional feature space in which we can find a failure avoidance policy. Then, the extracted feature space is used as a prior for RL. If we can extract proper feature space for a given task, a model of the policy can be simple and the policy can be easily improved. The performance of this learning paradigm is evaluated through simulation of a cart-pole system. As a result, we show that the proposed method can enhance the feature extraction process to find the proper feature space for a pole balancing policy. That is it allows a policy to effectively stabilize the pole in the largest domain of initial conditions compared to only using ICO learning or only using RL without any prior knowledge.
机译:本文的目的是通过组合基于相关的学习(ICO学习)和基于奖励的学习(RL),提出一种新的学习范式,提取基于相关的学习(ICO学习)和基于奖励的学习(RL)的基于相关的学习和基于奖励的学习来提取奖励相关的低维状态空间。由于ICO学习可以快速找到状态和不需要的条件之间的相关性(例如,失败),我们使用它来提取我们可以找到失败避免策略的低维特征空间。然后,将提取的特征空间用作RL之前的用作。如果我们可以为给定任务提取适当的特征空间,则策略模型可以很简单,可以轻松提高策略。通过模拟推车杆系统来评估该学习范例的性能。结果,我们表明该方法可以增强特征提取过程以找到极限均衡策略的适当特征空间。这是允许一项策略与仅使用ICO学习或仅在没有任何先前知识的情况下使用RL相比,在初始条件的最大领域中有效地稳定杆的极限。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号