首页> 外文会议>International conference on algorithms and architectures for parallel processing >Comparative Study of Distributed Deep Learning Tools on Supercomputers
【24h】

Comparative Study of Distributed Deep Learning Tools on Supercomputers

机译:超级计算机上分布式深度学习工具的比较研究

获取原文

摘要

With the growth of the scale of data set and neural networks, the training time is increasing rapidly. Distributed parallel training has been proposed to accelerate deep neural network training, and most efforts are made on top of GPU clusters. This paper focuses on the performance of distributed parallel training in CPU clusters of supercomputer systems. Using resources at the supercomputer system of "Tianhe-2", we conduct extensive evaluation of the performance of popular deep learning tools, including Caffe, TensorFlow, and BigDL, and several deep neural network models are tested, including Auto-Encoder, LeNet, AlexNet and ResNet. The experiment results show that Caffe performs the best in communication efficiency and scalability. BigDL is the fastest in computing speed benefiting from its optimization for CPU, but it suffers from long communication delay due to the dependency on MapReduce framework. The insights and conclusions from our evaluation provides significant reference for improving resource utility of supercomputer resources in distributed deep learning.
机译:随着数据集和神经网络规模的增长,训练时间正在迅速增加。已经提出了分布式并行训练来加速深度神经网络训练,并且大多数工作都在GPU集群之上进行。本文着重研究超级计算机系统CPU集群中分布式并行训练的性能。利用“天河2号”超级计算机系统中的资源,我们对流行的深度学习工具(包括Caffe,TensorFlow和BigDL)的性能进行了广泛的评估,并测试了多种深度神经网络模型,包括自动编码器,LeNet, AlexNet和ResNet。实验结果表明,Caffe在通信效率和可伸缩性方面表现最佳。 BigDL得益于其对CPU的优化,因此运算速度最快,但是由于依赖MapReduce框架,因此通信延迟长。我们评估的见解和结论为提高分布式深度学习中超级计算机资源的资源利用率提供了重要参考。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号