首页> 外文会议>Asia Pacific Symposium on Intelligent and Evolutionary Systems >Overtaking Method based on Variance of Values: Resolving the Exploration-Exploitation Dilemma
【24h】

Overtaking Method based on Variance of Values: Resolving the Exploration-Exploitation Dilemma

机译:基于价值差的超车方法:解决勘探开发困境

获取原文

摘要

The exploration-exploitation dilemma is an attractive theme in reinforcement learning. Under the tradeoff framework, a reinforcement learning agent must cleverly switch between exploration and exploitation because an action, which is estimated as the best in the current learning state, may not actually be the true best. We demonstrate that an agent can determine the best action under certain conditions even if the agent selects the exploitation phase. Under the conditions, the agent does not need an explicit exploration phase, thereby resolving the exploration-exploitation dilemma. We also propose a value function on actions and how to update this value function. The proposed method, the "overtaking method," can be integrated with existing methods, UCB1 and UCB1-tuned, for the multi-armed bandit problem without compromising features. The integrated models show better results than the original models.
机译:勘探开发困境是加固学习中有吸引力的主题。在权衡框架下,强化学习代理必须巧妙地在勘探和开发之间切换,因为估计在当前学习状态中最好的动作,可能实际上是真正的最好的。我们证明,即使代理选择开发阶段,代理商也可以在某些条件下确定最佳动作。在条件下,代理不需要明确的勘探阶段,从而解决勘探剥削困境。我们还提出了对操作的价值函数以及如何更新此值函数。所提出的方法,“超车方法”可以与现有方法,UCB1和UCB1调整集成,用于多武装强盗问题而不损害特征。集成模型显示出比原始模型更好的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号