Model-building semi-Markov adaptive critics

机译：模型建设半马尔可夫自适应批评者

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Adaptive or actor critics are a class of reinforcement learning (RL) or approximate dynamic programming (ADP) algorithms in which one searches over stochastic policies in order to determine the optimal deterministic policy. Classically, these algorithms have been studied for Markov decision processes (MDPs) in the context of model-free updates in which transition probabilities are avoided altogether. A model-free version for the semi-MDP (SMDP) for discounted reward in which the transition time of each transition can be a random variable was proposed in Gosavi [1]. In this paper, we propose a variant in which the transition probability model is built simultaneously with the value function and action-probability functions. While our new algorithm does not require the transition probabilities apriori, it generates them along with the estimation of the value function and the action-probability functions required in adaptive critics. Model-building and model-based versions of algorithms have numerous advantages in contrast to their model-free counterparts. In particular, they are more stable and may require less training. However the additional steps of building the model may require increased storage in the computer's memory. In addition to enumerating potential application areas for our algorithm, we will analyze the advantages and disadvantages of model building.

机译：自适应或演员批评者是一类加强学习（RL）或近似动态编程（ADP）算法，其中一个人搜索随机策略，以便确定最佳的确定性政策。经典上，已经在无模型更新的上下文中研究了这些算法，用于在无模型更新的上下文中，其中避免了转换概率。用于SEMI-MDP（SMDP）的无模型版本，用于折扣奖励，其中在GOSAVI [1]中提出了每个转换的过渡时间可以是随机变量的折扣奖励。在本文中，我们提出了一种变体，其中转换概率模型与价值函数和动作概率函数同时构建。虽然我们的新算法不需要过渡概率APRiori，但它会与估计值函数的估计和自适应批评者所需的动作概率功能一起生成它们。模型建设和基于模型的算法版本与无模型对应物相比具有许多优点。特别是，它们更稳定，可能需要较少的培训。然而，建立模型的额外步骤可能需要在计算机的内存中增加存储。除了枚举算法的潜在应用领域之外，我们将分析模型建筑的优缺点。

著录项

来源
《IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning》|2011年||共6页
会议地点
作者
Gosavi Abhijit; Murray Susan L.; Hu Jiaqiao;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.5-53;
关键词

相似文献

外文文献
中文文献
专利

1. Semi-Markov adaptive critic heuristics with application to airline revenue management [J] . Ketaki KULKARNI, Abhijit GOSAVI, Susan MURRAY, 控制理论与应用（英文版） . 2011,第003期

机译：半马尔可夫自适应评论家启发式方法在航空公司收益管理中的应用
2. Adjusted adaptive Lasso for covariate model-building in nonlinear mixed-effect pharmacokinetic models [J] . Haem Elham, Harling Kajsa, Ayatollahi Seyyed Mohammad Taghi, Journal of pharmacokinetics and pharmacodynamics . 2017,第1期

机译：适用于非线性混合效应药代理模型的协变量模型建筑的调整后的自适应套索
3. SVM-Based Tree-Type Neural Networks as a Critic in Adaptive Critic Designs for Control [J] . Deb A. K., Jayadeva, Gopal M., IEEE Transactions on Neural Networks . 2007,第4期

机译：基于SVM的树型神经网络在自适应关键控制设计中作为关键
4. Model-building semi-Markov adaptive critics [C] . Gosavi Abhijit, Murray Susan L., Hu Jiaqiao 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning . 2011

机译：模型建立的半马尔可夫自适应批评家
5. Adaptive/optimal neurocontrol based on adaptive critic designs for synchronous generators and FACTS devices in power systems using artificial neural networks. [D] . Park, Jung Wook. 2003

机译：基于自适应注释器设计的自适应/最优神经控制，用于使用人工神经网络的电力系统中的同步发电机和FACTS设备。
6. Silencing the Critics: Understanding the Effects of Cocaine Sensitization on Dorsolateral and Ventral Striatum in the Context of an Actor/Critic Model [O] . Yuji Takahashi, Geoffrey Schoenbaum, Yael Niv 2008

机译：让批评家沉默：在演员/批评模型的背景下了解可卡因敏化对背外侧和腹侧纹状体的影响
7. An Evolutionary Approach to Adaptive Model-Building [O] . Zhengjun Pan, Lishan Kang, Jun He, 100

机译：自适应模型构建的进化方法

Model-building semi-Markov adaptive critics

摘要

著录项

相似文献

相关主题

期刊订阅