Fundamental Q-learning Algorithm in Finding Optimal Policy

机译：基础Q学习算法查找最佳政策

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Based on the off-Policy TD Control-Q learning, an agent is trained by reinforcement learning to find the optimal policy to reach the terminal state in the paper, which includes exploring the five factors affecting the learning efficiency and the results. To learn function Q to estimate the pros and cons of taking the current action, it must try every possible state and every alternative action and make a summery in the process of learning. Therefore, there are two main methods in the process of learning: exploration and utilization. Exploration is a method to try new action that is undiscovered and aim to discover better actions. Utilization is a method to adopt the optimal policy which taking actions according to the information discovered.

机译：基于禁止禁止的TD Control - Q学习，通过加强学习培训代理人，以找到纸张中达到终端状态的最佳政策，包括探索影响学习效率和结果的五个因素。为了学习功能Q来估计采取当前行动的利弊，它必须尝试每一个可能的状态和每一个替代行动，并在学习过程中举起夏季。因此，在学习过程中有两种主要方法：勘探和利用。探索是一种尝试未被发现的新行动的方法，旨在发现更好的行为。利用是采用根据发现的信息采取行动的最佳政策的方法。

著录项

来源
《International Conference on Smart Grid and Electrical Automation》|2017年|723p|共4页
会议地点
作者
Canyu Sun;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TM76-53;
关键词
Learning (artificial intelligence); Flowcharts; Machine learning algorithms; Presses; Smart grids; Automation; Software algorithms;

机译：学习（人工智能）;流程图;机器学习算法;按下;智能网格;自动化;软件算法;

相似文献

外文文献
中文文献
专利

1. EFFICIENT ALGORITHMS FOR FINDING OPTIMAL POWER-OF-2 POLICIES FOR PRODUCTION/DISTRIBUTION SYSTEMS WITH GENERAL JOINT SETUP COSTS [J] . Federgruen A., Zheng YS. Operations Research: The Journal of the Operations Research Society of America . 1995,第3期

机译：为具有通用接头设置成本的生产/分配系统寻找最优2次幂策略的有效算法
2. Q-learning and policy iteration algorithms for stochastic shortest path problems [J] . Huizhen Yu, Dimitri P. Bertsekas Annals of Operations Research . 2013,第1期

机译：随机最短路径问题的Q学习和策略迭代算法
3. Q-learning and policy iteration algorithms for stochastic shortest path problems [J] . Huizhen Yu, Dimitri P. Bertsekas Annals of Operations Research . 2013,第sepa期

机译：随机最短路径问题的Q学习和策略迭代算法
4. Fundamental Q-learning Algorithm in Finding Optimal Policy [C] . Canyu Sun International Conference on Smart Grid and Electrical Automation . 2017

机译：寻找最优策略的基本Q学习算法
5. Models and algorithms for addressing travel time variability: Applications from optimal path finding and traffic equilibrium problems. [D] . Zhou, Zhong. 2008

机译：解决行程时间可变性的模型和算法：最佳路径查找和交通平衡问题的应用。
6. A First Step Towards Behavioral Coaching for Managing Stress: A Case Study on Optimal Policy Estimation with Multi-stage Threshold Q-learning [O] . Xinyu Hu, Pei-Yun S. Hsueh, Ching-Hua Chen, 2017

机译：应对压力的行为教练的第一步：基于多阶段阈值Q学习的最优策略估计的案例研究
7. Learning Automata based Multi-agent System Algorithms for Finding Optimal Policies in Markov Games [O] . B. Masoumi, M. R. Meybodi 2014

机译：基于学习自动机的多智能体系统算法寻找马尔可夫游戏中的最优策略

Fundamental Q-learning Algorithm in Finding Optimal Policy

摘要

著录项

相似文献

相关主题

期刊订阅