首页> 外文会议>Performance evaluation and benchmarking for the era of artificial Intelligence >Towards Evaluation of Tensorflow Performance in a Distributed Compute Environment
【24h】

Towards Evaluation of Tensorflow Performance in a Distributed Compute Environment

机译:评估分布式计算环境中的Tensorflow性能

获取原文
获取原文并翻译 | 示例

摘要

Tensorflow (TF) is a highly popular Deep Learning (DL) software framework. Neural network training, a critical part of DL workflow, is a computationally intensive process that can take days or even weeks. Therefore, achieving faster training times is an active area of research and practise. TF supports multiple GPU parallelization, both within a single machine and between multiple physical servers. However, the distributed case is hard to use and consequently, almost all published performance data comes from the single machine use case. To fill this gap, here we benchmark Tensorflow in a GPU-equipped distributed environment. Our work evaluates performance of various hardware and software combinations. In particular, we examine several types of interconnect technologies to determine their impact on performance. Our results show that with the right choice of input parameters and appropriate hardware, GPU-equipped general-purpose compute clusters can provide comparable deep learning training performance to specialized machines designed for AI workloads.
机译:Tensorflow(TF)是一种非常流行的深度学习(DL)软件框架。神经网络训练是DL工作流程的关键部分,是一个计算密集型过程,可能需要数天甚至数周的时间。因此,实现更快的培训时间是研究和实践的活跃领域。 TF支持在一台计算机内以及多个物理服务器之间的多个GPU并行化。但是,分布式案例很难使用,因此,几乎所有已发布的性能数据都来自单个机器的使用案例。为了填补这一空白,我们在配备GPU的分布式环境中对Tensorflow进行基准测试。我们的工作评估各种硬件和软件组合的性能。特别是,我们研究了几种类型的互连技术,以确定它们对性能的影响。我们的结果表明,通过正确选择输入参数和合适的硬件,配备GPU的通用计算集群可以为专为AI工作负载设计的专用机器提供可比的深度学习训练性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号