首页> 外文期刊>Journal of Advanced Computatioanl Intelligence and Intelligent Informatics >Proposal of the Continuous-Valued Penalty Avoiding Rational Policy Making Algorithm
【24h】

Proposal of the Continuous-Valued Penalty Avoiding Rational Policy Making Algorithm

机译:连续值罚分避免合理决策算法的建议

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Applying reinforcement learning to actual problems, sometimes requires the treatment of continuous-valued input and output. We previously proposed a process called Exploitation-oriented Learning (XoL) to strongly enhance successful experience and thereby reduce the number of trial-and-error searches. A method based on Penalty-Avoiding Rational Policymaking (PARP) is proposed as a XoL method corresponding to continuous-valued input, but types of action treating continuous-valued output are not executed. We study the treatment of continuous-valued output suitable for a XoL method in which the environment includes both a reward and a penalty. We extend PARP in continuous-valued input to continuous-valued output. We apply our proposal to the pole-cart balancing problem and the biped LEGO robot, and confirm its effectiveness.
机译:将强化学习应用于实际问题,有时需要处理连续值的输入和输出。我们之前提出了一个名为“利用剥削的学习(XoL)”的过程,以极大地增强成功经验,从而减少反复试验的次数。作为与连续值输入相对应的XoL方法,提出了一种基于避免惩罚的理性决策(PARP)的方法,但是并未执行处理连续值输出的动作类型。我们研究了适用于XoL方法的连续值输出的处理方法,在该方法中,环境既包含奖励又包含惩罚。我们将连续值输入中的PARP扩展为连续值输出。我们将我们的建议应用于手推车平衡问题和两足动物乐高机器人,并确认其有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号