...
首页> 外文期刊>SICE Journal of Control, Measurement, and System Integration (SICE JCMSI) >Acceleration of Reinforcement Learning by Controlled Use of Options Given as Prior Information
【24h】

Acceleration of Reinforcement Learning by Controlled Use of Options Given as Prior Information

机译:通过控制使用作为先验信息给出的选项来加速强化学习

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Reinforcement learning is a method with which an agent learns an appropriate action policy for solving problems by the trial-and-error. The advantage is that reinforcement learning can be applied to unknown or uncertain problems. But instead, there is a drawback that this method needs a long time to solve the problem because of the trial-and-error. If there is prior information about the environment, some of trial-and-error can be spared and the learning can take a shorter time. The prior information can be provided in the form of options by a human designer. But the options can be wrong because of uncertainties in the problems. If the wrong options are used, there can be bad effects such as failure to get the optimal policy and slowing down of reinforcement learning. This paper proposes to control use of the options to suppress the bad effects. The agent forgets the given options gradually while it learns the better policy. The proposed method is applied to three testbed environments and two types of prior information. The method shows good results in terms of both the learning speed and the quality of obtained policies.
机译:强化学习是一种方法,代理可以通过这种方法学习适当的行动策略,以通过试错法解决问题。优点是可以将强化学习应用于未知或不确定的问题。但是相反,存在一个缺点,即由于反复试验,该方法需要很长时间才能解决问题。如果有关于环境的事先信息,则可以省去一些反复试验,并且学习可以花费更短的时间。先验信息可以由人类设计者以选项的形式提供。但是由于问题的不确定性,这些选择可能是错误的。如果使用了错误的选项,则可能会产生不良影响,例如无法获得最佳策略并减慢强化学习的速度。本文建议控制选项的使用以抑制不良影响。代理在学习更好的策略时会逐渐忘记给定的选项。所提出的方法被应用于三个测试平台环境和两种类型的先验信息。该方法在学习速度和所获得策略的质量方面均显示出良好的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号