Overtaking Method based on Variance of Values: Resolving the Exploration–Exploitation Dilemma

Kento Ochi; Moto Kamiura

首页> 外文期刊>Procedia Computer Science >Overtaking Method based on Variance of Values: Resolving the Exploration–Exploitation Dilemma

【24h】

Overtaking Method based on Variance of Values: Resolving the Exploration–Exploitation Dilemma

机译：基于价值方差的超车方法：解决勘探开发难题

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The exploration–exploitation dilemma is an attractive theme in reinforcement learning. Under the tradeoff framework, a reinforcement learning agent must cleverly switch between exploration and exploitation because an action, which is estimated as the best in the current learning state, may not actually be the true best. We demonstrate that an agent can determine the best action under certain conditions even if the agent selects the exploitation phase. Under the conditions, the agent does not need an explicit exploration phase, thereby resolving the exploration–exploitation dilemma. We also propose a value function on actions and how to update this value function. The proposed method, the “overtaking method,” can be integrated with existing methods, UCB1 and UCB1-tuned, for the multi-armed bandit problem without compromising features. The integrated models show better results than the original models.

机译：探索与开发困境是强化学习中的一个有吸引力的主题。在权衡框架下，强化学习代理必须在探索和开发之间明智地切换，因为在当前学习状态下被认为是最好的动作实际上可能并不是真正的最好。我们证明，即使代理选择了开发阶段，代理也可以在某些条件下确定最佳操作。在这种情况下，主体不需要明确的勘探阶段，从而解决了勘探与开发的难题。我们还针对动作以及如何更新此价值函数提出了价值函数。对于多臂匪徒问题，所提出的方法“超车方法”可以与UCB1和UCB1调整的现有方法集成在一起，而不会影响功能。集成模型显示出比原始模型更好的结果。

著录项

来源
《Procedia Computer Science》 |2013年第1期|共11页
作者
Kento Ochi; Moto Kamiura;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词
Reinforcement learningExploration–exploitation dilemmaVarianceOvertaking method;

机译：强化学习探索与开发困境差异超越方法;

相似文献

外文文献
中文文献
专利

1. Neural mechanisms resolving exploitation-exploration dilemmas in the medial prefrontal cortex [J] . Domenech Philippe, Rheims Sylvain, Koechlin Etienne Science . 2020,第6507期

机译：神经机制解决内侧前额叶皮质中的剥削勘探困境
2. The individual side of ambidexterity Do inspirational leaders and organizational learning resolve the exploitation-exploration dilemma? [J] . Salas Vallina Andres, Moreno-Luzon Maria D., Ferrer-Franco Anna Employee relations . 2019,第3期

机译：模棱两可的个人方面鼓舞人心的领导者和组织学习是否解决了开发与探索的难题？
3. Selective maintenance of value information helps resolve the exploration/exploitation dilemma [J] . Hallquist Michael N., Dombrovski Alexandre Y. Cognition: International Journal of Cognitive Psychology . 2019,第期

机译：价值信息的选择性维护有助于解决探索/开发困境
4. Overtaking Method based on Variance of Values: Resolving the Exploration-Exploitation Dilemma [C] . Kento Ochi, Moto Kamiura Asia Pacific Symposium on Intelligent and Evolutionary Systems . 2013

机译：基于价值差的超车方法：解决勘探开发困境
5. How people resolve dilemmas: An elicitation method for subjective decision factors. [D] . Pai, Chen-Kuo. 2009

机译：人们如何解决困境：主观决策因素的启发方法。
6. Confronting and Resolving an Ethical Dilemma Associated with a Practice Based Evaluation Using Observational Methodology of Health Information Technology [O] . P.S. Sockolow, H.A. Taylor 2010

机译：使用健康信息技术的观察方法应对和解决与基于实践的评估相关的道德困境
7. Overtaking Method based on Variance of Values: Resolving the Exploration–Exploitation Dilemma [O] . Ochi Kento, Kamiura Moto 2013

机译：基于价值方差的超车方法：解决勘探开发难题

Overtaking Method based on Variance of Values: Resolving the Exploration–Exploitation Dilemma

摘要

著录项

相似文献

相关主题

期刊订阅