首页> 外文会议>Annual American Control Conference >Bias Correction in Reinforcement Learning via the Deterministic Policy Gradient Method for MPC-Based Policies

【24h】

Bias Correction in Reinforcement Learning via the Deterministic Policy Gradient Method for MPC-Based Policies

机译：基于MPC的策略的确定性政策梯度方法偏置校正

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we discuss the implementation of the Deterministic Policy Gradient using the Actor-Critic technique based on linear compatible advantage function approximations in the context of constrained policies. We focus on MPC-based policies, though the discussion is general. We show that in that context, the classic linear compatible advantage function approximation fails to deliver a correct policy gradient due to the exploration becoming distorted by the constraints, and we propose a generalized linear compatible advantage function approximation that corrects the problem. We show that this correction requires an estimation of the mean and covariance of the constrained exploration. The validity of that generalization is formally established and demonstrated on a simple example.

机译：在本文中，我们讨论了使用基于线性兼容优势函数近似的演员 - 批评技术在受约束策略的上下文中实现确定性政策梯度的实现。我们专注于基于MPC的政策，尽管讨论是一般的。我们展示在这种情况下，由于探索因约束而导致的探索，经典线性兼容优势函数近似失败无法提供正确的策略梯度，并且我们提出了纠正问题的广义线性兼容优势函数近似。我们表明，这种纠正需要估计受限制勘探的平均值和协方差。在一个简单的例子上正式建立和证明该概述的有效性。

著录项

来源
《Annual American Control Conference 》|2021年|2543-2548|共6页
会议地点
作者
Sébastien Gros; Mario Zanon;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Gradient methods; Estimation; Reinforcement learning; Function approximation;

机译：梯度方法;估计;强化学习;功能近似;

相似文献

外文文献
中文文献
专利

1. Reinforcement learning based optimal control of batch processes using Monte-Carlo deep deterministic policy gradient with phase segmentation [J] . Haeun Yoo, Boeun Kim, Jong Woo Kim, Computers & Chemical Engineering . 2021 ,第Jana4期

机译：基于跨越蒙特 - 卡洛深度确定性政策梯度的批量学习基于批处理流程的最优控制
2. Deep Ensemble Reinforcement Learning with Multiple Deep Deterministic Policy Gradient Algorithm [J] . Junta Wu, Huiyun Li Mathematical Problems in Engineering: Theory, Methods and Applications . 2020 ,第1期

机译：具有多种深度确定性政策梯度算法的深度集成钢筋学习
3. PP-PG: Combining Parameter Perturbation with Policy Gradient Methods for Effective and Efficient Explorations in Deep Reinforcement Learning [J] . Li Shilei, Li Meng, Su Jiongming, ACM transactions on intelligent systems and technology . 2021 ,第3期

机译：PP-PG：将参数扰动与政策梯度方法相结合，为深加固学习中有效和高效的探索
4. Deep Deterministic Gradient Policy (DDGP) Reinforcement Learning Assisted Degradation-Aware Control of Solid-State Transformer [C] . Moinul Shahidul Haque, Seungdeog Choi IEEE Applied Power Electronics Conference and Exposition . 2021

机译：深度确定性梯度政策（DDGP）加固学习辅助降解了固态变压器的降解感知控制
5. Policy-Aware Model Learning for Policy Gradient Methods [D] . Abachi, Romina . 2020

机译：政策感知模型学习策略梯度方法
6. Correction: Spike-Based Reinforcement Learning in Continuous State and Action Space: When Policy Gradient Methods Fail [O] . Eleni Vasilaki, Nicolas Frémaux, Robert Urbanczik, 2009

机译：更正：在连续状态和动作空间中基于峰值的强化学习：当策略梯度方法失败时
7. Correction: Spike-Based Reinforcement Learning in Continuous State and Action Space: When Policy Gradient Methods Fail [O] . Vasilaki, Eleni, Frémaux, Nicolas, Urbanczik, Robert, 2009

机译：更正：在连续状态和动作空间中基于峰值的强化学习：当策略梯度方法失败时

Bias Correction in Reinforcement Learning via the Deterministic Policy Gradient Method for MPC-Based Policies

摘要

著录项

相似文献

相关主题

期刊订阅