Combining a gradient-based method and an evolution strategy for multi-objective reinforcement learning

Chen Diqi; Wang Yizhou; Gao Wen

首页> 外文期刊>Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies >Combining a gradient-based method and an evolution strategy for multi-objective reinforcement learning

【24h】

Combining a gradient-based method and an evolution strategy for multi-objective reinforcement learning

机译：结合基于梯度的方法和多目标强力学习的演化策略

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Multi-objective reinforcement learning (MORL) algorithms aim to approximate the Pareto frontier uniformly in multi-objective decision making problems. In the scenario of deep reinforcement learning (RL), gradient-based methods are often adopted to learn deep policies/value functions due to the fast convergence speed, while pure gradient-based methods can not guarantee a uniformly approximated Pareto frontier. On the other side, evolution strategies straightly manipulate in the solution space to achieve a well-distributed Pareto frontier, but applying evolution strategies to optimize deep networks is still a challenging topic. To leverage the advantages of both kinds of methods, we propose a two-stage MORL framework combining a gradient-based method and an evolution strategy. First, an efficient multi-policy soft actor-critic algorithm is proposed to learn multiple policies collaboratively. The lower layers of all policy networks are shared. The first-stage learning can be regarded as representation learning. Secondly, the multi-objective covariance matrix adaptation evolution strategy (MO-CMA-ES) is applied to fine-tune policy-independent parameters to approach a dense and uniform estimation of the Pareto frontier. Experimental results on three benchmarks (Deep Sea Treasure, Adaptive Streaming, and Super Mario Bros) show the superiority of the proposed method.

机译：多目标强化学习（Morl）算法旨在在多目标决策中均匀地均匀地覆盖帕累托前沿。在深度加强学习（RL）的情况下，通常采用基于梯度的方法来学习由于快速收敛速度而学习深层政策/值功能，而纯基于梯度的方法不能保证均匀近似的帕累托前沿。在另一边，演变策略在解决方案空间中直接操纵，以实现一个分布良好的帕累托前沿，但应用进化策略来优化深网络仍然是一个具有挑战性的话题。为了利用两种方法的优势，我们提出了一种两级MORL框架，结合了基于梯度的方法和演化策略。首先，提出了一种有效的多策略软演员 - 批评算法来学习协同策略。共享所有策略网络的较低层。第一阶段学习可以被视为代表学习。其次，多目标协方差矩阵适应演化策略（MO-CMA-ES）应用于微调政策无关的参数，以接近帕累托前沿的密集和均匀估计。实验结果对三个基准（深海宝，自适应流和超级马里奥兄弟）表示提出的方法的优越性。

著录项

来源
《Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies 》 |2020年第10期| 共17页
作者
Chen Diqi; Wang Yizhou; Gao Wen;
展开▼
作者单位

Inst Comp Technol Key Lab Intelligent Informat Proc Beijing Peoples R China;

Peking Univ Sch Elect Engn &

Comp Sci EECS Inst Digital Media Natl Engn Lab Video Technol Key Lab Machine Perce Beijing Peoples R China;

Peking Univ Sch Elect Engn &

Comp Sci EECS Inst Digital Media Natl Engn Lab Video Technol Key Lab Machine Perce Beijing Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术 ;
关键词
Multi-objective reinforcement learning; Multi-policy reinforcement learning; Pareto frontier; Sampling efficiency;

机译：多目标强化学习;多政策加固学习;帕累托前沿;采样效率;

相似文献

外文文献
中文文献
专利

1. Combining a gradient-based method and an evolution strategy for multi-objective reinforcement learning [J] . Chen Diqi, Wang Yizhou, Gao Wen Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies . 2020 ,第10期

机译：结合基于梯度的方法和多目标强力学习的演化策略
2. Pareto Optimal Solutions for Network Defense Strategy Selection Simulator in Multi-Objective Reinforcement Learning [J] . Yang Sun, Yun Li, Wei Xiong, Applied Sciences . 2018 ,第1期

机译：多目标强化学习中网络防御策略选择模拟器的Pareto最优解决方案
3. Robot Skill Learning: From Reinforcement Learning to Evolution Strategies [J] . Freek Stulp, Olivier Sigaud Paladyn: Journal of Behavioral Robotics . 2013 ,第1期

机译：机器人技能学习：从强化学习到进化策略
4. An Integrated Generation-Compensation optimization Strategy for Enhanced Short-Term Voltage Security of Large-Scale Power Systems Using Multi-Objective Reinforcement Learning Method [C] . Zhuoming Deng, Mingbo Liu International Conference on Power System Technology . 2018

机译：基于多目标强化学习方法的大型电力系统短期电压安全性发电补偿优化综合策略
5. Study of hybrid strategies for multi-objective optimization using gradient based methods and evolutionary algorithms. [D] . Paez Bautista, Diego Fernando. 2013

机译：研究基于梯度的方法和进化算法的多目标优化混合策略。
6. DeepGraphMolGen a multi-objective computational strategy for generating molecules with desirable properties: a graph convolution and reinforcement learning approach [O] . Yash Khemchandani, Stephen O’Hagan, Soumitra Samanta, 2020

机译：深图摩尔一种用于产生具有理想性质的分子的多目标计算策略：图表卷积和增强学习方法
7. Multi-objective optimization of a vehicle body by combining gradient-based methods and vehicle concept modelling [O] . Gaetano G. De, Mundo D., Maletta C., 2015

机译：结合基于梯度的方法和车辆概念建模对车身进行多目标优化

Combining a gradient-based method and an evolution strategy for multi-objective reinforcement learning

摘要

著录项

相似文献

相关主题

期刊订阅