Deep Reinforcement Learning with Risk-Seeking Exploration

机译：寻求风险探索的深度强化学习

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In most contemporary work in deep reinforcement learning (DRL), agents are trained in simulated environments. Not only are simulated environments fast and inexpensive, they are also 'safe'. By contrast, training in a real world environment (using robots, for example) is not only slow and costly, but actions can also result in irreversible damage, either to the environment or to the agent (robot) itself. In this paper, we consider taking advantage of the inherent safety in computer simulation by extending the Deep Q-Network (DQN) algorithm with an ability to measure and take risk. In essence, we propose a novel DRL algorithm that encourages risk-seeking behaviour to enhance information acquisition during training. We demonstrate the merit of the exploration heuristic by (ⅰ) arguing that our risk estimator implicitly contains both parametric uncertainty and inherent uncertainty of the environment which are propagated back through Temporal Difference error across many time steps and (ⅱ) evaluating our method on three games in the Atari domain and showing that the technique works well on Montezuma's Revenge, a game that epitomises the challenge of sparse reward.

机译：在大多数当代的深度强化学习（DRL）工作中，代理商都是在模拟环境中接受训练的。模拟环境不仅快速，廉价，而且“安全”。相比之下，在现实环境中的训练（例如，使用机器人）不仅速度慢且成本高，而且动作也可能导致对环境或对代理（机器人）本身的不可逆转的破坏。在本文中，我们考虑通过扩展深度Q网络（DQN）算法并具有测量和承担风险的能力，来利用计算机仿真中的固有安全性。从本质上讲，我们提出了一种新颖的DRL算法，该算法可鼓励冒险行为，以增强训练过程中的信息获取。我们通过（ⅰ）争论我们的风险估算器隐含包含参数不确定性和环境固有的不确定性，并通过许多时间步长的时间差异误差传播回去，并（ⅱ）在三个游戏中评估我们的方法，从而证明了探索启发式方法的优点。在Atari领域进行了展示，并表明该技术在蒙特祖玛的复仇游戏中表现出色，该游戏集中体现了稀疏奖励的挑战。

著录项

来源
《International conference on simulation of adaptive behavior》|2018年|201-211|共11页
会议地点
作者
Nat Dilokthanakul; Murray Shanahan;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Deep reinforcement learning; Risk-sensitive; Exploration;

机译：深度强化学习;风险敏感;勘探;

相似文献

外文文献
中文文献
专利

1. PP-PG: Combining Parameter Perturbation with Policy Gradient Methods for Effective and Efficient Explorations in Deep Reinforcement Learning [J] . Li Shilei, Li Meng, Su Jiongming, ACM transactions on intelligent systems and technology . 2021,第3期

机译：PP-PG：将参数扰动与政策梯度方法相结合，为深加固学习中有效和高效的探索
2. Improving exploration efficiency of deep reinforcement learning through samples produced by generative model [J] . Xu Dayong, Zhu Fei, Liu Quan, Expert systems with applications . 2021,第Deca期

机译：通过生成模型生产的样本提高深度增强学习的探索效率
3. Reward Space Noise for Exploration in Deep Reinforcement Learning [J] . Sun Chuxiong, Wang Rui, Li Qian, International Journal of Pattern Recognition and Artificial Intelligence . 2021,第10期

机译：深度加固学习探索探索空间噪音
4. Deep Reinforcement Learning with Risk-Seeking Exploration [C] . Nat Dilokthanakul, Murray Shanahan International Conference on Simulation of Adaptive Behavior . 2018

机译：探险勘探深增强学习
5. Exploration and Safety in Deep Reinforcement Learning [D] . Achiam, Joshua S. 2021

机译：深增强学习中的探索与安全
6. An exploration strategy improves the diversity of de novo ligands using deep reinforcement learning: a case for the adenosine A2A receptor [O] . Xuhan Liu, Kai Ye, Herman W. T. van Vlijmen, 2019

机译：探索策略通过深度强化学习来改善从头配体的多样性：腺苷A2A受体的情况
7. Learning to Interrupt: A Hierarchical Deep Reinforcement Learning Framework for Efficient Exploration [O] . Tingguang Li, Jin Pan, Delong Zhu, 2018

机译：学习中断：高效勘探的分层深度加强学习框架

Deep Reinforcement Learning with Risk-Seeking Exploration

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅