...
首页> 外文期刊>Procedia Computer Science >Overtaking Method based on Variance of Values: Resolving the Exploration–Exploitation Dilemma
【24h】

Overtaking Method based on Variance of Values: Resolving the Exploration–Exploitation Dilemma

机译:基于价值方差的超车方法:解决勘探开发难题

获取原文
           

摘要

The exploration–exploitation dilemma is an attractive theme in reinforcement learning. Under the tradeoff framework, a reinforcement learning agent must cleverly switch between exploration and exploitation because an action, which is estimated as the best in the current learning state, may not actually be the true best. We demonstrate that an agent can determine the best action under certain conditions even if the agent selects the exploitation phase. Under the conditions, the agent does not need an explicit exploration phase, thereby resolving the exploration–exploitation dilemma. We also propose a value function on actions and how to update this value function. The proposed method, the “overtaking method,” can be integrated with existing methods, UCB1 and UCB1-tuned, for the multi-armed bandit problem without compromising features. The integrated models show better results than the original models.
机译:探索与开发困境是强化学习中的一个有吸引力的主题。在权衡框架下,强化学习代理必须在探索和开发之间明智地切换,因为在当前学习状态下被认为是最好的动作实际上可能并不是真正的最好。我们证明,即使代理选择了开发阶段,代理也可以在某些条件下确定最佳操作。在这种情况下,主体不需要明确的勘探阶段,从而解决了勘探与开发的难题。我们还针对动作以及如何更新此价值函数提出了价值函数。对于多臂匪徒问题,所提出的方法“超车方法”可以与UCB1和UCB1调整的现有方法集成在一起,而不会影响功能。集成模型显示出比原始模型更好的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号