首页> 外文期刊>Pattern recognition letters >Distributed training of deep neural networks with spark: The MareNostrum experience
【24h】

Distributed training of deep neural networks with spark: The MareNostrum experience

机译:用火花的深神经网络分布式培训:Marenostrum经验

获取原文
获取原文并翻译 | 示例
           

摘要

Deployment of a distributed deep learning technology stack on a large parallel system is a very complex process, involving the integration and configuration of several layers of both, general-purpose and custom software. The details of such kind of deployments are rarely described in the literature. This paper presents the experiences observed during the deployment of a technology stack to enable deep learning workloads on MareNostrum, a petascale supercomputer. The components of a layered architecture, based on the usage of Apache Spark, are described and the performance and scalability of the resulting system is evaluated. This is followed by a discussion about the impact of different configurations including parallelism, storage and networking alternatives, and other aspects related to the execution of deep learning workloads on a traditional HPC setup. The derived conclusions should be useful to guide similarly complex deployments in the future. (C) 2019 Elsevier B.V. All rights reserved.
机译:在大型并行系统上部署分布式深度学习技术堆栈是一个非常复杂的过程,涉及多个层,通用和自定义软件的多层集成和配置。文献中很少描述这种部署的细节。本文介绍了在部署技术堆栈期间观察到的经验,以便在MareNostrum上进行深度学习工作负载,这是一种吐凡板超级计算机。基于Apache Spark的使用,描述了分层体系结构的组件,并评估所得系统的性能和可伸缩性。随后是关于不同配置的影响,包括并行,存储和网络替代方案以及与传统HPC设置上的深度学习工作负载相关的其他方面。导出的结论应该有助于指导未来类似复杂的部署。 (c)2019 Elsevier B.v.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号