Model-building semi-Markov adaptive critics

机译：模型建立的半马尔可夫自适应批评家

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Adaptive or actor critics are a class of reinforcement learning (RL) or approximate dynamic programming (ADP) algorithms in which one searches over stochastic policies in order to determine the optimal deterministic policy. Classically, these algorithms have been studied for Markov decision processes (MDPs) in the context of model-free updates in which transition probabilities are avoided altogether. A model-free version for the semi-MDP (SMDP) for discounted reward in which the transition time of each transition can be a random variable was proposed in Gosavi [1]. In this paper, we propose a variant in which the transition probability model is built simultaneously with the value function and action-probability functions. While our new algorithm does not require the transition probabilities apriori, it generates them along with the estimation of the value function and the action-probability functions required in adaptive critics. Model-building and model-based versions of algorithms have numerous advantages in contrast to their model-free counterparts. In particular, they are more stable and may require less training. However the additional steps of building the model may require increased storage in the computer's memory. In addition to enumerating potential application areas for our algorithm, we will analyze the advantages and disadvantages of model building.

机译：自适应或演员评论家是一类强化学习（RL）或近似动态规划（ADP）算法，在该算法中，人们搜索随机策略以确定最佳确定性策略。传统上，已经在无模型更新的情况下针对Markov决策过程（MDP）研究了这些算法，在这些模型中，完全避免了转移概率。在Gosavi [1]中提出了一种半MDP（SMDP）折扣打折的无模型版本，其中每个转换的转换时间可以是随机变量。在本文中，我们提出了一种变体，其中同时建立了转移概率模型以及值函数和动作概率函数。虽然我们的新算法不需要先验的转移概率，但它会与自适应评论家中所需的价值函数和动作概率函数的估计一起生成它们。与无模型的模型相比，算法的模型构建和基于模型的版本具有众多优势。特别是，它们更稳定，可能需要较少的培训。但是，构建模型的其他步骤可能需要增加计算机内存中的存储量。除了列举该算法的潜在应用领域之外，我们还将分析模型构建的优缺点。

著录项

来源
《2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning》|2011年|p.170-175|共6页
会议地点
作者
Gosavi Abhijit; Murray Susan L.; Hu Jiaqiao;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类软件工程;
关键词

相似文献

外文文献
中文文献
专利

1. Semi-Markov adaptive critic heuristics with application to airline revenue management [J] . Ketaki KULKARNI, Abhijit GOSAVI, Susan MURRAY, 控制理论与应用（英文版） . 2011,第003期

机译：半马尔可夫自适应评论家启发式方法在航空公司收益管理中的应用
2. Adjusted adaptive Lasso for covariate model-building in nonlinear mixed-effect pharmacokinetic models [J] . Haem Elham, Harling Kajsa, Ayatollahi Seyyed Mohammad Taghi, Journal of pharmacokinetics and pharmacodynamics . 2017,第1期

机译：适用于非线性混合效应药代理模型的协变量模型建筑的调整后的自适应套索
3. SVM-Based Tree-Type Neural Networks as a Critic in Adaptive Critic Designs for Control [J] . Deb A. K., Jayadeva, Gopal M., IEEE Transactions on Neural Networks . 2007,第4期

机译：基于SVM的树型神经网络在自适应关键控制设计中作为关键
4. Model-building semi-Markov adaptive critics [C] . Gosavi Abhijit, Murray Susan L., Hu Jiaqiao IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning . 2011

机译：模型建设半马尔可夫自适应批评者
5. Adaptive/optimal neurocontrol based on adaptive critic designs for synchronous generators and FACTS devices in power systems using artificial neural networks. [D] . Park, Jung Wook. 2003

机译：基于自适应注释器设计的自适应/最优神经控制，用于使用人工神经网络的电力系统中的同步发电机和FACTS设备。
6. Silencing the Critics: Understanding the Effects of Cocaine Sensitization on Dorsolateral and Ventral Striatum in the Context of an Actor/Critic Model [O] . Yuji Takahashi, Geoffrey Schoenbaum, Yael Niv 2008

机译：让批评家沉默：在演员/批评模型的背景下了解可卡因敏化对背外侧和腹侧纹状体的影响
7. An Evolutionary Approach to Adaptive Model-Building [O] . Zhengjun Pan, Lishan Kang, Jun He, 100

机译：自适应模型构建的进化方法

Model-building semi-Markov adaptive critics

摘要

著录项

相似文献

相关主题

期刊订阅