首页> 外文会议>Annual conference on Genetic and evolutionary computation;Conference on Genetic and evolutionary computation >On-line evolutionary computation for reinforcement learning in stochastic domains
【24h】

On-line evolutionary computation for reinforcement learning in stochastic domains

机译:随机域中强化学习的在线进化计算

获取原文

摘要

In reinforcement learning, an agent interacting with its environment strives to learn a policy that specifies, for each state it may encounter, what action to take. Evolutionary computation is one of the most promising approaches to reinforcement learning but its success is largely restricted to off-line scenarios. In on-line scenarios, an agent must strive to maximize the reward it accrues while it is learning. Temporal difference (TD) methods, another approach to reinforcement learning, naturally excel in on-line scenarios because they have selection mechanisms for balancing the need to search for better policies exploration) with the need to accrue maximal reward (exploitation). This paper presents a novel way to strike this balance in evolutionary methods by borrowing the selection mechanisms used by TD methods to choose individual actions and using them in evolution to choose policies for evaluation. Empirical results in the mountain car and server job scheduling domains demonstrate that these techniques can substantially improve evolution's on-line performance in stochastic domains.
机译:强化学习中,与环境交互的代理力求学习一种策略,该策略针对其可能遇到的每个状态指定采取何种措施。进化计算是强化学习中最有前途的方法之一,但其成功很大程度上受限于离线方案。在在线方案中,座席必须努力使其在学习过程中获得的报酬最大化。 时差(TD)方法,另一种强化学习方法,在联机情况下自然会胜出,因为它们具有选择机制,可以平衡寻求更好政策的需求探索)需要累积最大的回报(开发)。本文通过借用TD方法用来选择单个动作并在进化中使用它们来选择评估策略的选择机制,提出了一种在进化方法中实现这种平衡的新颖方法。在山地车和服务器作业调度领域中的经验结果表明,这些技术可以极大地改善随机领域中Evolution的在线性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号