Fundamental Q-learning Algorithm in Finding Optimal Policy

机译：寻找最优策略的基本Q学习算法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Based on the off-Policy TD Control-Q learning, an agent is trained by reinforcement learning to find the optimal policy to reach the terminal state in the paper, which includes exploring the five factors affecting the learning efficiency and the results. To learn function Q to estimate the pros and cons of taking the current action, it must try every possible state and every alternative action and make a summery in the process of learning. Therefore, there are two main methods in the process of learning: exploration and utilization. Exploration is a method to try new action that is undiscovered and aim to discover better actions. Utilization is a method to adopt the optimal policy which taking actions according to the information discovered.

机译：基于非策略TD Control-Q学习，通过强化学习对代理进行训练，以找到达到最终状态的最优策略，其中包括探索影响学习效率和结果的五个因素。要学习功能Q来估计采取当前动作的利弊，它必须尝试每种可能的状态和每种替代动作，并在学习过程中做个总结。因此，学习过程中主要有两种方法：探索和利用。探索是一种尝试未发现的新动作并旨在发现更好动作的方法。利用是采用最佳策略的一种方法，该策略根据发现的信息采取措施。

著录项

来源
《International Conference on Smart Grid and Electrical Automation》|2017年|243-246|共4页
会议地点
作者
Canyu Sun;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Learning (artificial intelligence); Flowcharts; Machine learning algorithms; Presses; Smart grids; Automation; Software algorithms;

机译：学习（人工智能）;流程图;机器学习算法;印刷机;智能网格;自动化;软件算法;
入库时间 2022-08-26 15:18:52

相似文献

外文文献
中文文献
专利

1. EFFICIENT ALGORITHMS FOR FINDING OPTIMAL POWER-OF-2 POLICIES FOR PRODUCTION/DISTRIBUTION SYSTEMS WITH GENERAL JOINT SETUP COSTS [J] . Federgruen A., Zheng YS. Operations Research: The Journal of the Operations Research Society of America . 1995,第3期

机译：为具有通用接头设置成本的生产/分配系统寻找最优2次幂策略的有效算法
2. Q-learning and policy iteration algorithms for stochastic shortest path problems [J] . Huizhen Yu, Dimitri P. Bertsekas Annals of Operations Research . 2013,第1期

机译：随机最短路径问题的Q学习和策略迭代算法
3. Q-learning and policy iteration algorithms for stochastic shortest path problems [J] . Huizhen Yu, Dimitri P. Bertsekas Annals of Operations Research . 2013,第sepa期

机译：随机最短路径问题的Q学习和策略迭代算法
4. Fundamental Q-learning Algorithm in Finding Optimal Policy [C] . Canyu Sun International Conference on Smart Grid and Electrical Automation . 2017

机译：基础Q学习算法查找最佳政策
5. Models and algorithms for addressing travel time variability: Applications from optimal path finding and traffic equilibrium problems. [D] . Zhou, Zhong. 2008

机译：解决行程时间可变性的模型和算法：最佳路径查找和交通平衡问题的应用。
6. A First Step Towards Behavioral Coaching for Managing Stress: A Case Study on Optimal Policy Estimation with Multi-stage Threshold Q-learning [O] . Xinyu Hu, Pei-Yun S. Hsueh, Ching-Hua Chen, 2017

机译：应对压力的行为教练的第一步：基于多阶段阈值Q学习的最优策略估计的案例研究
7. Learning Automata based Multi-agent System Algorithms for Finding Optimal Policies in Markov Games [O] . B. Masoumi, M. R. Meybodi 2014

机译：基于学习自动机的多智能体系统算法寻找马尔可夫游戏中的最优策略

Fundamental Q-learning Algorithm in Finding Optimal Policy

摘要

著录项

相似文献

相关主题

期刊订阅