首页>
外国专利>
SECURE EXPLORATION FOR REINFORCEMENT LEARNING
SECURE EXPLORATION FOR REINFORCEMENT LEARNING
展开▼
机译:加强学习的安全探索
展开▼
页面导航
摘要
著录项
相似文献
摘要
A secured exploration agent for reinforcement learning (RL) is provided. Securitizing an exploration agent includes training the exploration agent to avoid dead-end states and dead-end trajectories. During training, the exploration agent “learns” to identify and avoid dead-end states of a Markov Decision Process (MDP). The secured exploration agent is utilized to safely and efficiently explore the environment, while significantly reducing the training time, as well as the cost and safety concerns associated with conventional RL. The secured exploration agent is employed to guide the behavior of a corresponding exploitation agent. During training, a policy of the exploration agent is iteratively updated to reflect an estimated probability that a state is a dead-end state. The probability, via the exploration policy, that the exploration agent chooses an action that results in a transition to a dead-end state is reduced to reflect the estimated probability that the state is a dead-end state.
展开▼