首页> 外文期刊>Concurrency, practice and experience >Scalable reinforcement learning on Cray XC
【24h】

Scalable reinforcement learning on Cray XC

机译:Cray XC上可扩展的强化学习

获取原文
获取原文并翻译 | 示例

摘要

Recent advancements in deep learning have made reinforcement learning (RL) applicable to a much broader range of decision making problems. However, the emergence of reinforcement learning workloads brings multiple challenges to system resource management. RL applications continuously train a deep learning or a machine learning model while interacting with uncertain simulation models. This new generation of AI applications impose significant demands on system resources such as memory, storage, network, and compute. In this paper, we describe a typical RL application workflow and introduce the Ray distributed execution framework developed at the UC Berkeley RISELab. Ray includes the RLlib library for executing distributed reinforcement learning applications. We describe a recipe for deploying the Ray execution framework on Cray XC systems and demonstrate scaling of RLlib algorithms across multiple nodes of the system. We also explore performance characteristics across multiple CPU and GPU node types.
机译:深度学习的最新进步使得强化学习(RL)适用于更广泛的决策问题。然而,加强学习工作负载的出现为系统资源管理带来了多种挑战。 RL应用程序在与不确定的模拟模型进行交互时持续培训深度学习或机器学习模型。这一新一代AI应用程序对系统资源(如内存,存储,网络和计算)施加了显着的要求。在本文中,我们描述了一个典型的RL应用程序工作流程,并在UC Berkeley Riselab中引入了开发的射线分布式执行框架。 Ray包括用于执行分布式增强学习应用程序的RLLIB库。我们描述了用于在CRAY XC系统上部署RAY执行框架的配方,并演示跨系统的多个节点的RLLIB算法的缩放。我们还探讨了多个CPU和GPU节点类型的性能特征。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号