Actor-Critic Algorithms with Online Feature Adaptation

K. J. PRABUCHANDRAN; SHALABH BHATNAGAR; VIVEK S. BORKAR

首页> 外文期刊>ACM Transactions on Modeling and Computer Simulation >Actor-Critic Algorithms with Online Feature Adaptation

【24h】

Actor-Critic Algorithms with Online Feature Adaptation

机译：具有在线特征自适应的Actor-Critic算法

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

We develop two new online actor-critic control algorithms with adaptive feature tuning for Markov Decision Processes (MDPs). One of our algorithms is proposed for the long-run average cost objective, while the other works for discounted cost MDPs. Our actor-critic architecture incorporates parameterization both in the policy and the value function. A gradient search in the policy parameters is performed to improve the performance of the actor. The computation of the aforementioned gradient, however, requires an estimate of the value function of the policy corresponding to the current actor parameter. The value function, on the other hand, is approximated using linear function approximation and obtained from the critic. The error in approximation of the value function, however, results in suboptimal policies. In our article, we also update the features by performing a gradient descent on the Grassmannian of features to minimize a mean square Bellman error objective in order to find the best features. The aim is to obtain a good approximation of the value function and thereby ensure convergence of the actor to locally optimal policies. In order to estimate the gradient of the objective in the case of the average cost criterion, we utilize the policy gradient theorem, while in the case of the discounted cost objective, we utilize the simultaneous perturbation stochastic approximation (SPSA) scheme. We prove that our actor-critic algorithms converge to locally optimal policies. Experiments on two different settings show performance improvements resulting from our feature adaptation scheme.

机译：我们针对Markov决策过程（MDP）开发了两种具有自适应特征调整功能的新的在线actor-critic控制算法。我们针对长期平均成本目标提出了一种算法，而另一种算法则针对折扣成本MDP。我们的行动者批评体系结构将参数化纳入了策略和价值函数中。执行策略参数中的梯度搜索以提高参与者的性能。但是，上述梯度的计算需要对与当前参与者参数相对应的策略的值函数进行估计。另一方面，使用线性函数近似来近似值函数，并从评论家那里获得。但是，近似值函数的错误会导致策略不理想。在我们的文章中，我们还通过对特征的Grassmannian进行梯度下降以最小化均方Bellman误差目标，以找到最佳特征，从而更新了特征。目的是获得价值函数的良好近似值，从而确保参与者与局部最优策略的融合。为了在平均成本标准的情况下估计目标的梯度，我们使用了策略梯度定理，而在折现成本目标的情况下，我们采用了同时扰动随机逼近（SPSA）方案。我们证明了演员批评算法收敛于局部最优策略。在两种不同设置上进行的实验表明，我们的功能自适应方案可以改善性能。

著录项

来源
《ACM Transactions on Modeling and Computer Simulation》 |2016年第4期|24.1-24.26|共26页
作者
K. J. PRABUCHANDRAN; SHALABH BHATNAGAR; VIVEK S. BORKAR;
展开▼
作者单位

Department of Computer Science & Automation, Indian Institute of Science, Bangalore 560012;

Department of Computer Science & Automation, Indian Institute of Science, Bangalore 560012;

Department of Electrical Engineering, Indian Institute of Technology, Bombay, Powai, Mumbai 400076;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. An Online Actor-Critic Algorithm with Function Approximation for Constrained Markov Decision Processes [J] . Bhatnagar S., Lakshmanan K. Journal of Optimization Theory and Applications . 2012,第3期

机译：约束Markov决策过程的带函数逼近的在线Actor-Critic算法
2. Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem [J] . Kyriakos G. Vamvoudakis, Frank L. Lewis Automatica . 2010,第5期

机译：在线actor-critic算法解决连续时间无限视界最优控制问题
3. Noise-Shaping Gradient Descent-Based Online Adaptation Algorithms for Digital Calibration of Analog Circuits [J] . Chakrabartty S., Shaga R. K., Aono K. Neural Networks and Learning Systems, IEEE Transactions on . 2013,第4期

机译：基于噪声整形梯度下降的在线自适应算法，用于模拟电路的数字校准
4. The True Online Continuous Learning Automation (TOCLA) in a continuous control benchmarking of actor-critic algorithms [C] . Gordon Frost, Marta Vallejo IEEE Symposium Series on Computational Intelligence . 2020

机译：演员 - 评论家算法的连续控制基准中的真实在线连续学习自动化（TOCLA）
5. A Bounded Actor-Critic Algorithm for Reinforcement Learning [D] . Lawhead, Ryan Jacob. 2017

机译：一种有限于钢筋学习的批评算法
6. Cost-Constrained feature selection in binary classification: adaptations for greedy forward selection and genetic algorithms [O] . Rudolf Jagdhuber, Michel Lang, Arnulf Stenzl, 2020

机译：二元分类中受成本约束的特征选择：贪婪前向选择和遗传算法的改编
7. Towards Feature Selection In Actor-Critic Algorithms [O] . Rohanimanesh Khashayar, Roy Nicholas, Tedrake Russ 2007

机译：演员批评算法中的特征选择

Actor-Critic Algorithms with Online Feature Adaptation

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅