Towards Evaluation of Tensorflow Performance in a Distributed Compute Environment

机译：评估分布式计算环境中的Tensorflow性能

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Tensorflow (TF) is a highly popular Deep Learning (DL) software framework. Neural network training, a critical part of DL workflow, is a computationally intensive process that can take days or even weeks. Therefore, achieving faster training times is an active area of research and practise. TF supports multiple GPU parallelization, both within a single machine and between multiple physical servers. However, the distributed case is hard to use and consequently, almost all published performance data comes from the single machine use case. To fill this gap, here we benchmark Tensorflow in a GPU-equipped distributed environment. Our work evaluates performance of various hardware and software combinations. In particular, we examine several types of interconnect technologies to determine their impact on performance. Our results show that with the right choice of input parameters and appropriate hardware, GPU-equipped general-purpose compute clusters can provide comparable deep learning training performance to specialized machines designed for AI workloads.

机译：Tensorflow（TF）是一种非常流行的深度学习（DL）软件框架。神经网络训练是DL工作流程的关键部分，是一个计算密集型过程，可能需要数天甚至数周的时间。因此，实现更快的培训时间是研究和实践的活跃领域。 TF支持在一台计算机内以及多个物理服务器之间的多个GPU并行化。但是，分布式案例很难使用，因此，几乎所有已发布的性能数据都来自单个机器的使用案例。为了填补这一空白，我们在配备GPU的分布式环境中对Tensorflow进行基准测试。我们的工作评估各种硬件和软件组合的性能。特别是，我们研究了几种类型的互连技术，以确定它们对性能的影响。我们的结果表明，通过正确选择输入参数和合适的硬件，配备GPU的通用计算集群可以为专为AI工作负载设计的专用机器提供可比的深度学习训练性能。

著录项

来源
《Performance evaluation and benchmarking for the era of artificial Intelligence》|2018年|82-93|共12页
会议地点 Rio de Janeiro(BR)
作者
Miro Hodak; Ajay Dholakia;
展开▼
作者单位

Lenovo, Data Center Group, Morrisville, NC, USA;

Lenovo, Data Center Group, Morrisville, NC, USA;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Tensorflow; Deep learning; GPU; Distributed computing Performance;

机译：Tensorflow;深度学习； GPU;分布式计算性能;

相似文献

外文文献
中文文献
专利

1. Implementation and performance evaluation of a distributed conjugate gradient method in a cloud computing environment [J] . Leila Ismail, Rajeev Barua Software . 2013,第3期

机译：云计算环境中分布式共轭梯度法的实现与性能评估
2. Performance evaluation of measurement data acquisition mechanisms in a distributed computing environment integrating remote laboratory instrumentation [J] . Luca Berruti, Franco Davoli, Sandro Zappatore Future generation computer systems . 2013,第2期

机译：集成远程实验室仪器的分布式计算环境中测量数据采集机制的性能评估
3. TensorFlow at Scale: Performance and productivity analysis of distributed training with Horovod, MLSL, and Cray PE ML [J] . Thorsten Kurth, Mikhail Smorkalov, PeterMendygral, Concurrency, practice and experience . 2019,第16期

机译：大规模TensorFlow：使用Horovod，MLSL和Cray PE ML进行分布式培训的性能和生产力分析
4. Towards Evaluation of Tensorflow Performance in a Distributed Compute Environment [C] . Miro Hodak, Ajay Dholakia TPC Technology Conference on Performance Evaluation and Benchmarking . 2019

机译：在分布式计算环境中评估TensorFlow性能
5. Election protocols in distributed computing systems and distributed databases and their performance evaluation [D] . El-Ruby, Mohamed Hassan. 1990

机译：分布式计算系统和分布式数据库中的选举协议及其性能评估
6. aRNApipe: a balanced efficient and distributed pipeline for processing RNA-seq data in high-performance computing environments [O] . Arnald Alonso, Brittany N Lasseigne, Kelly Williams, -1

机译：aRNApipe：一种平衡高效且分布式的管道用于在高性能计算环境中处理RNA-seq数据
7. Design and Experimental Evaluation of DeepMarket: An Edge Computing Marketplace with Distributed TensorFlow Execution Capability [O] . Soyoung Kim 2000

机译：DeepMarket的设计与实验评估：具有分布式Tensorflow执行能力的边缘计算市场

Towards Evaluation of Tensorflow Performance in a Distributed Compute Environment

摘要

著录项

相似文献

相关主题

期刊订阅