首页> 外文会议>International Conference on Smart Grid and Electrical Automation >Fundamental Q-learning Algorithm in Finding Optimal Policy
【24h】

Fundamental Q-learning Algorithm in Finding Optimal Policy

机译:基础Q学习算法查找最佳政策

获取原文

摘要

Based on the off-Policy TD Control-Q learning, an agent is trained by reinforcement learning to find the optimal policy to reach the terminal state in the paper, which includes exploring the five factors affecting the learning efficiency and the results. To learn function Q to estimate the pros and cons of taking the current action, it must try every possible state and every alternative action and make a summery in the process of learning. Therefore, there are two main methods in the process of learning: exploration and utilization. Exploration is a method to try new action that is undiscovered and aim to discover better actions. Utilization is a method to adopt the optimal policy which taking actions according to the information discovered.
机译:基于禁止禁止的TD Control - Q学习,通过加强学习培训代理人,以找到纸张中达到终端状态的最佳政策,包括探索影响学习效率和结果的五个因素。为了学习功能Q来估计采取当前行动的利弊,它必须尝试每一个可能的状态和每一个替代行动,并在学习过程中举起夏季。因此,在学习过程中有两种主要方法:勘探和利用。探索是一种尝试未被发现的新行动的方法,旨在发现更好的行为。利用是采用根据发现的信息采取行动的最佳政策的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号