首页> 外文会议>IEEE Congress on Evolutionary Computation >Simultaneously Evolving Deep Reinforcement Learning Models using Multifactorial optimization
【24h】

Simultaneously Evolving Deep Reinforcement Learning Models using Multifactorial optimization

机译:使用多因素优化同时发展深度强化学习模型

获取原文

摘要

In recent years, Multifactorial optimization (MFO) has gained a notable momentum in the research community. MFO is known for its inherent capability to efficiently address multiple optimization tasks at the same time, while transferring information among such tasks to improve their convergence speed. On the other hand, the quantum leap made by Deep Q Learning (DQL) in the Machine Learning field has allowed facing Reinforcement Learning (RL) problems of unprecedented complexity. Unfortunately, complex DQL models usually find it difficult to converge to optimal policies due to the lack of exploration or sparse rewards. In order to overcome these drawbacks, pre-trained models are widely harnessed via Transfer Learning, extrapolating knowledge acquired in a source task to the target task. Besides, meta-heuristic optimization has been shown to reduce the lack of exploration of DQL models. This work proposes a MFO framework capable of simultaneously evolving several DQL models towards solving interrelated RL tasks. Specifically, our proposed framework blends together the benefits of meta-heuristic optimization, Transfer Learning and DQL to automate the process of knowledge transfer and policy learning of distributed RL agents. A thorough experimentation is presented and discussed so as to assess the performance of the framework, its comparison to the traditional methodology for Transfer Learning in terms of convergence, speed and policy quality, and the intertask relationships found and exploited over the search process.
机译:近年来,多因素优化(MFO)在研究领域获得了显着的发展势头。 MFO以其固有的能力而闻名,它可以有效地同时处理多个优化任务,同时在这些任务之间传输信息以提高其收敛速度。另一方面,深度Q学习(DQL)在机器学习领域取得了巨大的飞跃,这使人们面临着前所未有的复杂性增强学习(RL)问题。不幸的是,由于缺乏探索或稀疏的奖励,复杂的DQL模型通常难以收敛到最优策略。为了克服这些缺点,通过转移学习广泛使用了预训练的模型,将在源任务中获得的知识外推到目标任务。此外,已经证明了元启发式优化可以减少对DQL模型的探索。这项工作提出了一个MFO框架,该框架能够同时发展多个DQL模型以解决相关的RL任务。具体来说,我们提出的框架将元启发式优化,转移学习和DQL的优势融合在一起,以实现分布式RL代理的知识转移和策略学习过程的自动化。提出并讨论了一个彻底的实验,以便评估该框架的性能,在融合,速度和策略质量以及在搜索过程中发现和利用的任务间关系方面,与传统的迁移学习方法进行比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号