TensorFlow on State-of-the-Art HPC Clusters: A Machine Learning use Case

机译：最新的HPC集群上的TensorFlow：机器学习用例

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The recent rapid growth of the data-flow programming paradigm enabled the development of specific architectures, e.g., for machine learning. The most known example is the Tensor Processing Unit (TPU) by Google. Standard data-centers, however, still can not foresee large partitions dedicated to machine learning specific architectures. Within data-centers, the High-Performance Computing (HPC) clusters are highly parallel machines targeting a broad class of compute-intensive workflows, as such they can be used for tackling machine learning challenges. On top of this, HPC architectures are rapidly changing, including accelerators and instruction sets other than the classical x86 CPUs. In this blurry scenario, identifying which are the best hardware/software configurations to efficiently support machine learning workloads on HPC clusters is not trivial. In this paper, we considered the workflow of TensorFlow for image recognition. We highlight the strong dependency of the performance in the training phase on the availability of arithmetic libraries optimized for the underlying architecture. Following the example of Intel leveraging the MKL libraries for improving the TensorFlow performance, we plugged the Arm Performance Libraries into TensorFlow and tested on an HPC cluster based on Marvell ThunderX2 CPUs. Also, we performed a scalability study on three state-of-the-art HPC clusters based on different CPU architectures, x86 Intel Skylake, Arm-v8 Marvell ThunderX2, and PowerPC IBM Power9.

机译：数据流编程范例的最近快速增长使得能够开发例如用于机器学习的特定架构。最著名的示例是Google的张量处理单元（TPU）。但是，标准数据中心仍然无法预见专用于机器学习特定体系结构的大型分区。在数据中心内，高性能计算（HPC）集群是针对大量计算密集型工作流的高度并行机器，因此它们可用于应对机器学习挑战。最重要的是，HPC架构正在迅速变化，包括传统x86 CPU以外的加速器和指令集。在这种模糊的情况下，确定哪些是最佳的硬件/软件配置以有效地支持HPC群集上的机器学习工作负载并非易事。在本文中，我们考虑了TensorFlow用于图像识别的工作流程。我们着重指出培训阶段的性能强烈依赖针对基础架构优化的算术库的可用性。按照英特尔利用MKL库改善TensorFlow性能的示例，我们将Arm Performance Libraries插入TensorFlow，并在基于Marvell ThunderX2 CPU的HPC集群上进行了测试。此外，我们对基于不同CPU架构的三个最新HPC集群，x86 Intel Skylake，Arm-v8 Marvell ThunderX2和PowerPC IBM Power9进行了可伸缩性研究。

著录项

来源
《IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing》|2019年|526-533|共8页
会议地点
作者
Guillem Ramirez-Gargallo; Marta Garcia-Gasulla; Filippo Mantovani;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
computer centres; data flow computing; image recognition; instruction sets; learning (artificial intelligence); microprocessor chips; parallel architectures; parallel machines; pattern clustering; software libraries;

机译：计算机中心;数据流计算;图像识别;指令集;学习（人工智能）;微处理器芯片;并行体系结构;并行机器;模式聚类;软件库;

相似文献

外文文献
中文文献
专利

1. Python Machine Learning: Machine Learning and Deep Learning With Python, Scikit-Learn, and TensorFlow 2, Third Edition [J] . Yuan Ren International journal of knowledge-based organizations . 2021,第1期

机译：Python机器学习：使用Python，Scikit-Learn和Tensorflow 2，第三版使用Python
2. Survei Penggunaan Tensorflow pada Machine Learning untuk Identifikasi Ikan Kawasan Lahan Basah [J] . Nuruddin Wiranda, Harja Santana Purba, R Ati Sukmawati Indonesian Journal of Electronics and Instrumentation Systems . 2020,第2期

机译：用机器学习识别湿陆地区张力流的调查
3. Book Review: Hands-on Machine Learning with Scikit-Learn, Keras, and Tensorflow, 2nd edition by Aurelien Geron [J] . Douglass Michael J. J. Australasian physical & engineering sciences in medicine . 2020,第3期

机译：图书评论：实践机器学习与Scikit-Learn，Keras和Tensorflow，2ND Edition by Aurelien Geron
4. TensorFlow on State-of-the-Art HPC Clusters: A Machine Learning use Case [C] . Guillem Ramirez-Gargallo, Marta Garcia-Gasulla, Filippo Mantovani IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing . 2019

机译：最先进的HPC集群上的Tensorflow：机器学习用例
5. Analyzing the Impact of Concurrency on Scaling Machine Learning Programs Using TensorFlow. [D] . Denizov, Sheyn. 2017

机译：使用TensorFlow分析并发性对扩展机器学习程序的影响。
6. Machine learning in critical care: state-of-the-art and a sepsis case study [O] . Alfredo Vellido, Vicent Ribas, Carles Morales, 2018

机译：重症监护中的机器学习：最新技术和败血症案例研究
7. Machine learning in SQL by translation to TensorFlow [O] . Nantia Makrynioti, Ruy Ley-Wild, Vasilis Vassalos 2021

机译：通过翻译到TensorFlow的SQL中的机器学习

TensorFlow on State-of-the-Art HPC Clusters: A Machine Learning use Case

摘要

著录项

相似文献

相关主题

期刊订阅