首页> 外国专利> SECURE EXPLORATION FOR REINFORCEMENT LEARNING

SECURE EXPLORATION FOR REINFORCEMENT LEARNING

机译：加强学习的安全探索

页面导航

摘要
著录项
相似文献

摘要

A secured exploration agent for reinforcement learning (RL) is provided. Securitizing an exploration agent includes training the exploration agent to avoid dead-end states and dead-end trajectories. During training, the exploration agent “learns” to identify and avoid dead-end states of a Markov Decision Process (MDP). The secured exploration agent is utilized to safely and efficiently explore the environment, while significantly reducing the training time, as well as the cost and safety concerns associated with conventional RL. The secured exploration agent is employed to guide the behavior of a corresponding exploitation agent. During training, a policy of the exploration agent is iteratively updated to reflect an estimated probability that a state is a dead-end state. The probability, via the exploration policy, that the exploration agent chooses an action that results in a transition to a dead-end state is reduced to reflect the estimated probability that the state is a dead-end state.

机译：提供了用于强化学习（RL）的安全探索代理。对勘探代理进行证券化包括对勘探代理进行培训，以避开死角状态和死角轨迹。在训练期间，勘探代理“学习”以识别和避免马尔可夫决策过程（MDP）的死角状态。安全的探查代理可用于安全有效地探查环境，同时大大减少了培训时间以及与常规RL相关的成本和安全问题。使用安全的探查代理来指导相应的探查代理的行为。在训练期间，将迭代更新探查代理的策略，以反映状态为无用状态的估计概率。通过探索策略，探索代理选择导致过渡到死角状态的动作的概率降低，以反映该状态为死角状态的估计概率。

著录项

公开/公告号US2020076857A1

专利类型
公开/公告日2020-03-05

原文格式PDF
申请/专利权人 MICROSOFT TECHNOLOGY LICENSING LLC;
展开▼

申请/专利号US201916554525
发明设计人 HARM HENDRIK VAN SEIJEN;SEYED MEHDI FATEMI BOOSHEHRI;
展开▼

申请日2019-08-28
分类号H04L29/06;G06N5/04;G06N20;
国家 US
入库时间 2022-08-21 11:18:59

相似文献

专利
外文文献
中文文献