Softmax exploration strategies for multiobjective reinforcement learning

Vamplew Peter; Dazeley Richard; Foale Cameron

首页> 外文期刊>Neurocomputing >Softmax exploration strategies for multiobjective reinforcement learning

【24h】

Softmax exploration strategies for multiobjective reinforcement learning

机译：用于多目标强化学习的Softmax探索策略

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Despite growing interest over recent years in applying reinforcement learning to multiobjective problems, there has been little research into the applicability and effectiveness of exploration strategies within the multiobjective context. This work considers several widely-used approaches to exploration from the single-objective reinforcement learning literature, and examines their incorporation into multiobjective Q-learning. In particular this paper proposes two novel approaches which extend the softmax operator to work with vector-valued rewards. The performance of these exploration strategies is evaluated across a set of benchmark environments. Issues arising from the multiobjective formulation of these benchmarks which impact on the performance of the exploration strategies are identified. It is shown that of the techniques considered, the combination of the novel softmax-epsilon exploration with optimistic initialisation provides the most effective trade-off between exploration and exploitation. (C) 2017 Elsevier B.V. All rights reserved.

机译：尽管近年来对将强化学习应用于多目标问题的兴趣日益浓厚，但对于多目标环境下探索策略的适用性和有效性的研究很少。这项工作考虑了从单目标强化学习文献中探索的几种广泛使用的方法，并研究了它们在多目标Q学习中的应用。特别是，本文提出了两种新颖的方法，这些方法扩展了softmax运算符以使用矢量值奖励。这些探索策略的性能在一组基准环境中进行评估。确定了这些基准的多目标制定所产生的问题，这些问题会影响勘探策略的性能。结果表明，在所考虑的技术中，新型softmax-ε勘探与乐观初始化的结合提供了勘探与开发之间最有效的权衡。（C）2017 Elsevier B.V.保留所有权利。

著录项

来源
《Neurocomputing》 |2017年第8期|74-86|共13页
作者
Vamplew Peter; Dazeley Richard; Foale Cameron;
展开▼
作者单位

Federat Univ Australia, Sch Engn & Informat Technol, Federat Learning Agents Grp, Ballarat, Vic, Australia;

Federat Univ Australia, Sch Engn & Informat Technol, Federat Learning Agents Grp, Ballarat, Vic, Australia;

Federat Univ Australia, Sch Engn & Informat Technol, Federat Learning Agents Grp, Ballarat, Vic, Australia;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Multiobjective reinforcement learning; Exploration; epsilon-greedy exploration; Optimistic initialisation; Softmax;

机译：多目标强化学习;探索;ε贪婪探索;乐观初始化;Softmax;

相似文献

外文文献
中文文献
专利

1. Learning Exploration/Exploitation Strategies for Single Trajectory Reinforcement Learning [J] . Damien Ernst, Francis Maes, Michael Castronovo, JMLR: Workshop and Conference Proceedings . 2012,第2012期

机译：单轨强化学习的学习探索/开发策略
2. Theoretical Analysis of Efficiency and Robustness of Softmax and Gap-Increasing Operators in Reinforcement Learning [J] . Tadashi Kozuno, Eiji Uchibe, Kenji Doya JMLR: Workshop and Conference Proceedings . 2018,第2009期

机译：增强学习中Softmax和间隙增加算子的效率和鲁棒性的理论分析
3. Theoretical Analysis of Efficiency and Robustness of Softmax and Gap-Increasing Operators in Reinforcement Learning [J] . Tadashi Kozuno, Eiji Uchibe, Kenji Doya JMLR: Workshop and Conference Proceedings . 2018,第2010期

机译：增强学习中Softmax和间隙增加算子的效率和鲁棒性的理论分析
4. Faster Quantum Alternative to Softmax Selection in Deep Learning and Deep Reinforcement Learning [C] . Oscar Galindo, Christian Ayub, Martine Ceberio, IEEE Symposium Series on Computational Intelligence . 2019

机译：深度学习和深度强化学习中选择Softmax的更快量子替代方法
5. Exploration and Safety in Deep Reinforcement Learning [D] . Achiam, Joshua S. 2021

机译：深增强学习中的探索与安全
6. An exploration strategy improves the diversity of de novo ligands using deep reinforcement learning: a case for the adenosine A2A receptor [O] . Xuhan Liu, Kai Ye, Herman W. T. van Vlijmen, 2019

机译：探索策略通过深度强化学习来改善从头配体的多样性：腺苷A2A受体的情况
7. Learning to soar: exploration strategies in reinforcement learning for resource-constrained missions [O] . Chung Jen Jen 2014

机译：学习飞涨：资源受限任务的强化学习探索策略

Softmax exploration strategies for multiobjective reinforcement learning

摘要

著录项

相似文献

相关主题

期刊订阅