首页> 外文会议>International Conference on Smart Grid and Electrical Automation >Fundamental Q-learning Algorithm in Finding Optimal Policy
【24h】

Fundamental Q-learning Algorithm in Finding Optimal Policy

机译:寻找最优策略的基本Q学习算法

获取原文

摘要

Based on the off-Policy TD Control-Q learning, an agent is trained by reinforcement learning to find the optimal policy to reach the terminal state in the paper, which includes exploring the five factors affecting the learning efficiency and the results. To learn function Q to estimate the pros and cons of taking the current action, it must try every possible state and every alternative action and make a summery in the process of learning. Therefore, there are two main methods in the process of learning: exploration and utilization. Exploration is a method to try new action that is undiscovered and aim to discover better actions. Utilization is a method to adopt the optimal policy which taking actions according to the information discovered.
机译:基于非策略TD Control-Q学习,通过强化学习对代理进行训练,以找到达到最终状态的最优策略,其中包括探索影响学习效率和结果的五个因素。要学习功能Q来估计采取当前动作的利弊,它必须尝试每种可能的状态和每种替代动作,并在学习过程中做个总结。因此,学习过程中主要有两种方法:探索和利用。探索是一种尝试未发现的新动作并旨在发现更好动作的方法。利用是采用最佳策略的一种方法,该策略根据发现的信息采取措施。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号