首页> 外文会议>IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing >Scaling a Convolutional Neural Network for Classification of Adjective Noun Pairs with TensorFlow on GPU Clusters
【24h】

Scaling a Convolutional Neural Network for Classification of Adjective Noun Pairs with TensorFlow on GPU Clusters

机译:扩展卷积神经网络以在GPU群集上使用TensorFlow对形容词名词对进行分类

获取原文

摘要

Deep neural networks have gained popularity in recent years, obtaining outstanding results in a wide range of applications such as computer vision in both academia and multiple industry areas. The progress made in recent years cannot be understood without taking into account the technological advancements seen in key domains such as High Performance Computing, more specifically in the Graphic Processing Unit (GPU) domain. These kind of deep neural networks need massive amounts of data to effectively train the millions of parameters they contain, and this training can take up to days or weeks depending on the computer hardware we are using. In this work, we present how the training of a deep neural network can be parallelized on a distributed GPU cluster. The effect of distributing the training process is addressed from two different points of view. First, the scalability of the task and its performance in the distributed setting are analyzed. Second, the impact of distributed training methods on the training times and final accuracy of the models is studied. We used TensorFlow on top of the GPU cluster of servers with 2 K80 GPU cards, at Barcelona Supercomputing Center (BSC). The results show an improvement for both focused areas. On one hand, the experiments show promising results in order to train a neural network faster. The training time is decreased from 106 hours to 16 hours in our experiments. On the other hand we can observe how increasing the numbers of GPUs in one node rises the throughput, images per second, in a near-linear way. Morever an additional distributed speedup of 10.3 is achieved with 16 nodes taking as baseline the speedup of one node.
机译:近年来,深度神经网络已变得越来越流行,在学术界和多个行业领域中的计算机视觉等广泛应用中均获得了出色的成果。如果不考虑在高性能计算等关键领域(尤其是在图形处理单元(GPU)领域)中看到的技术进步,就无法理解近年来取得的进步。这些类型的深度神经网络需要大量数据才能有效地训练它们包含的数百万个参数,而这种训练可能需要长达数天或数周的时间,具体取决于我们使用的计算机硬件。在这项工作中,我们介绍了如何在分布式GPU集群上并行化深度神经网络的训练。从两个不同的角度解决了分配培训过程的影响。首先,分析了任务的可伸缩性及其在分布式环境中的性能。其次,研究了分布式训练方法对训练时间和模型最终精度的影响。我们在巴塞罗那超级计算中心(BSC)的服务器GPU集群上使用了TensorFlow,该服务器带有2张K80 GPU卡。结果表明,这两个重点领域都有所改善。一方面,实验显示出令人鼓舞的结果,以便更快地训练神经网络。在我们的实验中,训练时间从106小时减少到16小时。另一方面,我们可以观察到一个节点中GPU数量的增加如何以近乎线性的方式提高吞吐量(每秒图像)。此外,以一个节点的加速为基准,以16个节点为基础,可实现10.3的额外分布式加速。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号