首页> 外文会议>2011 IEEE International Conference on Systems, Man, and Cybernetics >Reinforcement learning with nonstationary reward depending on the episode
【24h】

Reinforcement learning with nonstationary reward depending on the episode

机译:根据情节进行非平稳奖励的强化学习

获取原文

摘要

A model which represents nonstationary reward is proposed for reinforcement learning(RL). RL is a framework that the agent learns by the interaction with an environment. The agent receives the reward, and learns its behavior. The reward is determined by the designer. It is not necessary to design the behavior of the agent so that RL is expected to be applied to various applications. However, conventional RL algorithms work under the assumption that the environment is stationary. In other words, conventional RL can not accept unstationary rewards and the change of the objective. From the point of view of real world applications, it is necessary for the agent to deal with a change of the objective. In this paper, a learning technique to deal with a temporal change of the reward is proposed. In the proposed reward representation, the reward is divided into two parts: episode-dependent part and episode-independent part. The simulation experiments show the effectiveness of the proposed method.
机译:提出了一种表示非平稳报酬的模型用于强化学习(RL)。 RL是代理通过与环境交互来学习的框架。代理获得奖励并了解其行为。奖励由设计者确定。不必设计代理的行为,这样就可以预期将RL应用到各种应用程序中。但是,常规的RL算法是在环境稳定的前提下工作的。换句话说,常规RL无法接受不稳定的奖励和目标的改变。从实际应用的角度来看,代理有必要应对目标的改变。在本文中,提出了一种处理奖励随时间变化的学习技术。在提出的奖励表示中,奖励分为两个部分:情节相关部分和情节独立部分。仿真实验表明了该方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号