首页> 外文会议>Annual American Control Conference >Bias Correction in Reinforcement Learning via the Deterministic Policy Gradient Method for MPC-Based Policies
【24h】

Bias Correction in Reinforcement Learning via the Deterministic Policy Gradient Method for MPC-Based Policies

机译:基于MPC的策略的确定性政策梯度方法偏置校正

获取原文

摘要

In this paper, we discuss the implementation of the Deterministic Policy Gradient using the Actor-Critic technique based on linear compatible advantage function approximations in the context of constrained policies. We focus on MPC-based policies, though the discussion is general. We show that in that context, the classic linear compatible advantage function approximation fails to deliver a correct policy gradient due to the exploration becoming distorted by the constraints, and we propose a generalized linear compatible advantage function approximation that corrects the problem. We show that this correction requires an estimation of the mean and covariance of the constrained exploration. The validity of that generalization is formally established and demonstrated on a simple example.
机译:在本文中,我们讨论了使用基于线性兼容优势函数近似的演员 - 批评技术在受约束策略的上下文中实现确定性政策梯度的实现。 我们专注于基于MPC的政策,尽管讨论是一般的。 我们展示在这种情况下,由于探索因约束而导致的探索,经典线性兼容优势函数近似失败无法提供正确的策略梯度,并且我们提出了纠正问题的广义线性兼容优势函数近似。 我们表明,这种纠正需要估计受限制勘探的平均值和协方差。 在一个简单的例子上正式建立和证明该概述的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号