首页> 外文会议>Applying New Technology in Green Buildings >Multi-Objective Exploration for Proximal Policy Optimization
【24h】

Multi-Objective Exploration for Proximal Policy Optimization

机译:近端政策优化的多目标探索

获取原文

摘要

In Reinforcement Learning, the reward is one of the main components to optimize the strategy. While other approaches are based on a simple scalar reward to get an optimal policy, we propose a model learning the designated reward in numerous conditions. Our method, which we call multi-objective exploration for proximal policy optimization (MOE-PPO), alleviates the dependence on the reward design by executing the Preferent Surrogate Objective (PSO). We also make full use of Curiosity Driven Exploration to increase exploration ability. Our experiments test MOE-PPO in the Super Mario Bros environment designed by OpenAIGym with three criteria to illustrate our approach's effectiveness. The result shows that MOE-PPO outperforms other on-policy algorithms under many conditions.
机译:在钢筋学习中,奖励是优化策略的主要组成部分之一。 虽然其他方法是基于一个简单的标量奖励来获得最佳政策,但我们提出了一种在许多条件下学习指定奖励的模型。 我们呼叫近端政策优化(MoE-PPO)的多目标勘探的方法,通过执行优选的代理目标(PSO)来减轻对奖励设计的依赖。 我们还充分利用了好奇因素探索,以提高勘探能力。 我们的实验测试Moe-PPO在由OpenAigym设计的超级马里奥兄弟环境中,具有三个标准来说明我们的方法的效果。 结果表明,MoE-PPO在许多条件下优于其他策略算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号