首页> 外文会议>International Conference on Communications, Information System and Computer Engineering >Research on Proximal Policy Optimization Algorithm Based on N-step Update
【24h】

Research on Proximal Policy Optimization Algorithm Based on N-step Update

机译:基于N步骤更新的近端策略优化算法研究

获取原文

摘要

PPO algorithm is updated in temporal-difference. Although it is more stable than monte-carlo update algorithm, the iterative cost is greatly increased and the convergence effect is difficult to guarantee. To solve the above problems, an algorithm with N-step updating is proposed to improve it, which is called n-PPO. Specifically, the algorithm not only absorbs the characteristics that temporal-difference updating method has comprehensive exploration space and can estimate the value flexibly and quickly, but also takes into account the advantages that monte-carlo updating method has accurate results, less iterations and fast convergence when exploring the complete state sequence. Experimental results show that the proposed method can reduce the volatility and variance of data under the premise of ensuring correct convergence.
机译:PPO算法在时间差异中更新。 虽然它比Monte-Carlo更新算法更稳定,但迭代成本大大增加,难以保证收敛效果。 为了解决上述问题,提出了一种具有n步更新的算法来改进它,该算法称为N-PPO。 具体而言,该算法不仅吸收了时间差更新方法具有全面的探索空间的特征,并且可以灵活快速地估计值,而且还考虑到Monte-Carlo更新方法具有准确的结果,更少的迭代和快速收敛的优势 探索完整状态序列时。 实验结果表明,该方法可以降低确保正确收敛的前提下数据的波动性和方差。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号