Multi-Objective Exploration for Proximal Policy Optimization

机译：近端政策优化的多目标探索

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In Reinforcement Learning, the reward is one of the main components to optimize the strategy. While other approaches are based on a simple scalar reward to get an optimal policy, we propose a model learning the designated reward in numerous conditions. Our method, which we call multi-objective exploration for proximal policy optimization (MOE-PPO), alleviates the dependence on the reward design by executing the Preferent Surrogate Objective (PSO). We also make full use of Curiosity Driven Exploration to increase exploration ability. Our experiments test MOE-PPO in the Super Mario Bros environment designed by OpenAIGym with three criteria to illustrate our approach's effectiveness. The result shows that MOE-PPO outperforms other on-policy algorithms under many conditions.

机译：在钢筋学习中，奖励是优化策略的主要组成部分之一。虽然其他方法是基于一个简单的标量奖励来获得最佳政策，但我们提出了一种在许多条件下学习指定奖励的模型。我们呼叫近端政策优化（MoE-PPO）的多目标勘探的方法，通过执行优选的代理目标（PSO）来减轻对奖励设计的依赖。我们还充分利用了好奇因素探索，以提高勘探能力。我们的实验测试Moe-PPO在由OpenAigym设计的超级马里奥兄弟环境中，具有三个标准来说明我们的方法的效果。结果表明，MoE-PPO在许多条件下优于其他策略算法。

著录项

来源
《Applying New Technology in Green Buildings》|2021年|105-109|共5页
会议地点
作者
Nguyen Do Hoang Khoi; Cuong Pham Van; Hoang Vu Tran; Cao Dung Truong;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Green buildings; Microprocessors; Architecture; Stochastic processes; Optimization methods; Computer architecture; Reinforcement learning;

机译：绿色建筑;微处理器;建筑;随机过程;优化方法;计算机架构;加固学习;
入库时间 2022-08-26 13:57:28

相似文献

外文文献
中文文献
专利

1. Robust fuzzy linear quadratic regulator control optimized by multi-objective high exploration particle swarm optimization for a 4 degree-of-freedom quadrotor [J] . Aerospace science and technology . 2020,第Feba期

机译：四自由度四旋翼多目标高探索粒子群算法优化的鲁棒模糊线性二次调节器控制
2. Finding near-perfect parameters for hardware and code optimizations with automatic multi-objective design space explorations [J] . Ralf Jahr, Horia Calborean, Lucian Vintan, Concurrency, practice and experience . 2015,第9期

机译：通过自动多目标设计空间探索，找到用于硬件和代码优化的近乎完美的参数
3. A proximal point method for difference of convex functions in multi-objective optimization with application to group dynamic problems [J] . Computational optimization and applications . 2020,第1期

机译：应用于组动态问题的多目标优化中凸函数差异的近端点方法
4. Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy [C] . Boyi Liu, Qi Cai, Zhuoran Yang, Conference on Neural Information Processing Systems . 2020

机译：神经近端/信任区域政策优化可实现全球最佳政策
5. Exploiting topological sensitivity for structural design exploration and multi-objective topology optimization [D] . Turevsky, Inna 2011

机译：利用拓扑敏感性进行结构设计探索和多目标拓扑优化
6. Research on the Multiagent Joint Proximal Policy Optimization Algorithm Controlling Cooperative Fixed-Wing UAV Obstacle Avoidance [O] . Weiwei Zhao, Hairong Chu, Xikui Miao, 2020

机译：控制合作固定翼UAV避免的多读联合近端政策优化算法研究
7. Multi-Objective Exploration of Compiler Optimizations for Real-Time Systems [O] . Paul Lokuciejewski, Sascha Plazar, Heiko Falk, 2012

机译：实时系统编译器优化的多目标探索

Multi-Objective Exploration for Proximal Policy Optimization

摘要

著录项

相似文献

相关主题

期刊订阅