首页> 外文会议>IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing >TensorFlow on State-of-the-Art HPC Clusters: A Machine Learning use Case
【24h】

TensorFlow on State-of-the-Art HPC Clusters: A Machine Learning use Case

机译:最新的HPC集群上的TensorFlow:机器学习用例

获取原文

摘要

The recent rapid growth of the data-flow programming paradigm enabled the development of specific architectures, e.g., for machine learning. The most known example is the Tensor Processing Unit (TPU) by Google. Standard data-centers, however, still can not foresee large partitions dedicated to machine learning specific architectures. Within data-centers, the High-Performance Computing (HPC) clusters are highly parallel machines targeting a broad class of compute-intensive workflows, as such they can be used for tackling machine learning challenges. On top of this, HPC architectures are rapidly changing, including accelerators and instruction sets other than the classical x86 CPUs. In this blurry scenario, identifying which are the best hardware/software configurations to efficiently support machine learning workloads on HPC clusters is not trivial. In this paper, we considered the workflow of TensorFlow for image recognition. We highlight the strong dependency of the performance in the training phase on the availability of arithmetic libraries optimized for the underlying architecture. Following the example of Intel leveraging the MKL libraries for improving the TensorFlow performance, we plugged the Arm Performance Libraries into TensorFlow and tested on an HPC cluster based on Marvell ThunderX2 CPUs. Also, we performed a scalability study on three state-of-the-art HPC clusters based on different CPU architectures, x86 Intel Skylake, Arm-v8 Marvell ThunderX2, and PowerPC IBM Power9.
机译:数据流编程范例的最近快速增长使得能够开发例如用于机器学习的特定架构。最著名的示例是Google的张量处理单元(TPU)。但是,标准数据中心仍然无法预见专用于机器学习特定体系结构的大型分区。在数据中心内,高性能计算(HPC)集群是针对大量计算密集型工作流的高度并行机器,因此它们可用于应对机器学习挑战。最重要的是,HPC架构正在迅速变化,包括传统x86 CPU以外的加速器和指令集。在这种模糊的情况下,确定哪些是最佳的硬件/软件配置以有效地支持HPC群集上的机器学习工作负载并非易事。在本文中,我们考虑了TensorFlow用于图像识别的工作流程。我们着重指出培训阶段的性能强烈依赖针对基础架构优化的算术库的可用性。按照英特尔利用MKL库改善TensorFlow性能的示例,我们将Arm Performance Libraries插入TensorFlow,并在基于Marvell ThunderX2 CPU的HPC集群上进行了测试。此外,我们对基于不同CPU架构的三个最新HPC集群,x86 Intel Skylake,Arm-v8 Marvell ThunderX2和PowerPC IBM Power9进行了可伸缩性研究。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号