Research on Proximal Policy Optimization Algorithm Based on N-step Update

机译：基于N步骤更新的近端策略优化算法研究

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

PPO algorithm is updated in temporal-difference. Although it is more stable than monte-carlo update algorithm, the iterative cost is greatly increased and the convergence effect is difficult to guarantee. To solve the above problems, an algorithm with N-step updating is proposed to improve it, which is called n-PPO. Specifically, the algorithm not only absorbs the characteristics that temporal-difference updating method has comprehensive exploration space and can estimate the value flexibly and quickly, but also takes into account the advantages that monte-carlo updating method has accurate results, less iterations and fast convergence when exploring the complete state sequence. Experimental results show that the proposed method can reduce the volatility and variance of data under the premise of ensuring correct convergence.

机译：PPO算法在时间差异中更新。虽然它比Monte-Carlo更新算法更稳定，但迭代成本大大增加，难以保证收敛效果。为了解决上述问题，提出了一种具有n步更新的算法来改进它，该算法称为N-PPO。具体而言，该算法不仅吸收了时间差更新方法具有全面的探索空间的特征，并且可以灵活快速地估计值，而且还考虑到Monte-Carlo更新方法具有准确的结果，更少的迭代和快速收敛的优势探索完整状态序列时。实验结果表明，该方法可以降低确保正确收敛的前提下数据的波动性和方差。

著录项

来源
《International Conference on Communications, Information System and Computer Engineering》|2021年|854-857|共4页
会议地点
作者
Zhao Guoqing; Xu Junming; Liu Aidong; Yu Jing;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Training; Monte Carlo methods; Space missions; Decision making; Reinforcement learning; Real-time systems; Iterative algorithms;

机译：训练;蒙特卡罗方法;太空任务;决策;加强学习;实时系统;迭代算法;

相似文献

外文文献
中文文献
专利

1. Improved optimization algorithm for proximal point-based dictionary updating methods [J] . Zhao Changchen, Hwang Wen-Liang, Lin Chun-Liang, Journal of electronic imaging . 2016,第5期

机译：基于近端点的字典更新方法的改进优化算法
2. A pretrained proximal policy optimization algorithm with reward shaping for aircraft guidance to a moving destination in three-dimensional continuous space [J] . Zhuang Wang, Hui Li, Zhaoxin Wu, International Journal of Advanced Robotic Systems . 2021,第1期

机译：一种预先训练的近端策略优化算法，奖励飞机指导在三维连续空间中移动目的地的飞机指导
3. RELAXED AUGMENTED LAGRANGIAN-BASED PROXIMAL POINT ALGORITHMS FOR CONVEX OPTIMIZATION WITH LINEAR CONSTRAINTS [J] . Yuan Shen, Wenxing Zhang, Bingsheng He Journal of industrial and management optimization . 2014,第3期

机译：松弛约束的基于拉格朗日的近似点算法用于线性约束的凸优化
4. An N-step Look Ahead Algorithm Using Mixed (On and Off) Policy Reinforcement Learning [C] . Vivek Kuchibhotla, P Harshitha, Shobhit Goyal International Conference on Intelligent Sustainable Systems . 2020

机译：使用混合（on和OFF）政策强化学习的N步骤算法
5. A frequency response functions-based model updating algorithm for condition assessment of in-service bridges. [D] . Garcia-Palencia, Antonio Javier. 2014

机译：基于频率响应函数的模型更新算法，用于在役桥梁的状态评估。
6. A Conjugate Gradient Algorithm with Function Value Information and N-Step Quadratic Convergence for Unconstrained Optimization [O] . Xiangrong Li, Xupei Zhao, Xiabin Duan, -1

机译：具有函数值信息和N步二次收敛的共轭梯度算法无约束优化
7. A Conjugate Gradient Algorithm with Function Value Information and N-Step Quadratic Convergence for Unconstrained Optimization. [O] . Xiangrong Li, Xupei Zhao, Xiabin Duan, 2015

机译：无约束优化的具有函数值信息和N步二次收敛的共轭梯度算法。

Research on Proximal Policy Optimization Algorithm Based on N-step Update

摘要

著录项

相似文献

相关主题

期刊订阅