首页> 外文会议>TPC Technology Conference on Performance Evaluation and Benchmarking >Towards Evaluation of Tensorflow Performance in a Distributed Compute Environment
【24h】

Towards Evaluation of Tensorflow Performance in a Distributed Compute Environment

机译:在分布式计算环境中评估TensorFlow性能

获取原文

摘要

Tensorflow (TF) is a highly popular Deep Learning (DL) software framework. Neural network training, a critical part of DL workflow, is a computationally intensive process that can take days or even weeks. Therefore, achieving faster training times is an active area of research and practise. TF supports multiple GPU parallelization, both within a single machine and between multiple physical servers. However, the distributed case is hard to use and consequently, almost all published performance data comes from the single machine use case. To fill this gap, here we benchmark Tensorflow in a GPU-equipped distributed environment. Our work evaluates performance of various hardware and software combinations. In particular, we examine several types of interconnect technologies to determine their impact on performance. Our results show that with the right choice of input parameters and appropriate hardware, GPU-equipped general-purpose compute clusters can provide comparable deep learning training performance to specialized machines designed for AI workloads.
机译:Tensorflow(TF)是一个高度受欢迎的深度学习(DL)软件框架。神经网络培训是DL工作流的关键部分,是一种计算密集的过程,可能需要几天甚至几周。因此,实现更快的培训时间是一个活跃的研究和实践领域。 TF在单台机器和多个物理服务器之间支持多个GPU并行化。但是,分布式案例难以使用,因此,几乎所有公开的性能数据都来自单机使用情况。为了填补这个差距,在这里我们在配备GPU的分布式环境中基于TensorFlow。我们的工作评估了各种硬件和软件组合的性能。特别是,我们检查几种类型的互连技术,以确定它们对性能的影响。我们的研究结果表明,通过正确选择输入参数和适当的硬件,配备GPU的通用计算集群可以为专为AI工作负载设计的专业机器提供可比的深度学习训练性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号