Overtaking Method based on Variance of Values: Resolving the Exploration-Exploitation Dilemma

机译：基于价值差的超车方法：解决勘探开发困境

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The exploration-exploitation dilemma is an attractive theme in reinforcement learning. Under the tradeoff framework, a reinforcement learning agent must cleverly switch between exploration and exploitation because an action, which is estimated as the best in the current learning state, may not actually be the true best. We demonstrate that an agent can determine the best action under certain conditions even if the agent selects the exploitation phase. Under the conditions, the agent does not need an explicit exploration phase, thereby resolving the exploration-exploitation dilemma. We also propose a value function on actions and how to update this value function. The proposed method, the "overtaking method," can be integrated with existing methods, UCB1 and UCB1-tuned, for the multi-armed bandit problem without compromising features. The integrated models show better results than the original models.

机译：勘探开发困境是加固学习中有吸引力的主题。在权衡框架下，强化学习代理必须巧妙地在勘探和开发之间切换，因为估计在当前学习状态中最好的动作，可能实际上是真正的最好的。我们证明，即使代理选择开发阶段，代理商也可以在某些条件下确定最佳动作。在条件下，代理不需要明确的勘探阶段，从而解决勘探剥削困境。我们还提出了对操作的价值函数以及如何更新此值函数。所提出的方法，“超车方法”可以与现有方法，UCB1和UCB1调整集成，用于多武装强盗问题而不损害特征。集成模型显示出比原始模型更好的结果。

著录项

来源
《Asia Pacific Symposium on Intelligent and Evolutionary Systems》|2013年||共11页
会议地点
作者
Kento Ochi; Moto Kamiura;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP183-532;
关键词
Reinforcement learning; Exploration-exploitation dilemma; Variance; Overtaking method;

机译：加强学习;探索 - 剥削困境;方差;超车方法;

相似文献

外文文献
中文文献
专利

1. Overtaking Method based on Variance of Values: Resolving the Exploration–Exploitation Dilemma [J] . Kento Ochi, Moto Kamiura Procedia Computer Science . 2013,第1期

机译：基于价值方差的超车方法：解决勘探开发难题
2. Subsampling methods to estimate the variance of sample means based on nonstationary spatial data with varying expected values [J] . Magnus Ekstrom, Sara Sjostedt-de Luna Quality Control and Applied Statistics . 2005,第6期

机译：基于具有变化的期望值的非平稳空间数据来估计样本均值方差的子采样方法
3. Using Kidder's dilemma paradigm to resolve conflicts in library core values [J] . Wanda V. Dole, Jitka M. Hurych New library world . 2009,第9a10期

机译：使用基德的困境范式解决图书馆核心价值中的冲突
4. Overtaking Method based on Variance of Values: Resolving the Exploration-Exploitation Dilemma [C] . Kento Ochi, Moto Kamiura Asia Pacific Symposium on Intelligent and Evolutionary Systems . 2013

机译：基于价值差的超车方法：解决勘探开发困境
5. How people resolve dilemmas: An elicitation method for subjective decision factors. [D] . Pai, Chen-Kuo. 2009

机译：人们如何解决困境：主观决策因素的启发方法。
6. Confronting and Resolving an Ethical Dilemma Associated with a Practice Based Evaluation Using Observational Methodology of Health Information Technology [O] . P.S. Sockolow, H.A. Taylor 2010

机译：使用健康信息技术的观察方法应对和解决与基于实践的评估相关的道德困境
7. Overtaking Method based on Variance of Values: Resolving the Exploration–Exploitation Dilemma [O] . Ochi Kento, Kamiura Moto 2013

机译：基于价值方差的超车方法：解决勘探开发难题

Overtaking Method based on Variance of Values: Resolving the Exploration-Exploitation Dilemma

摘要

著录项

相似文献

相关主题

期刊订阅