首页> 外文期刊>Future generation computer systems >Exploiting potential of deep neural networks by layer-wise fine-grained parallelism
【24h】

Exploiting potential of deep neural networks by layer-wise fine-grained parallelism

机译:通过分层细粒度并行性挖掘深层神经网络的潜力

获取原文
获取原文并翻译 | 示例
           

摘要

Deep neural networks (DNNs) have become more and more important for big data analysis. They usually use data parallelism or model parallelism for extreme scale computing. However, the two approaches realize the performance improvement mainly by using coarse-grained parallelization schemes. Neither can fully exploit the potentials of the parallelism of many-core systems (such as GPUs) for neural network models. Here, a new fine grained parallelism strategy (named FiLayer) is presented based on layer-wise parallelization. It has two components: inter-layer parallelism and intra-layer parallelism. The inter-layer parallelism makes several neighboring layers be processed by using a pipeline manner in a network model. For intra-layer parallelism, the operations in one layer are separated into several parts and processed concurrently. To implement above fine-grained parallelism methods, CUDA streams are used. A mathematical analysis is presented for the influence of fragment number on performance of the inter-layer parallelism, and also an analysis for the influence of CUDA stream number on the performance of the intra-layer parallelism is given. The proposed approach is realized based on Caffe. Some representative datasets including CIFAR100 and ImageNet, are applied for experiments. The evaluation results show that it can help Caffe realize remarkable speedups, which makes much sense to big data analysis. (C) 2019 Elsevier B.V. All rights reserved.
机译:深度神经网络(DNN)在大数据分析中变得越来越重要。他们通常使用数据并行性或模型并行性进行极端规模的计算。但是,这两种方法主要是通过使用粗粒度并行化方案来实现性能的提高。两者都无法充分利用神经网络模型的多核系统(例如GPU)的并行化潜力。在此,基于分层并行化提出了一种新的细粒度并行化策略(称为FiLayer)。它具有两个组成部分:层间并行性和层内并行性。层间并行性使网络模型中使用流水线方式处理几个相邻的层。对于层内并行性,将一层中的操作分为几部分并同时进行处理。为了实现上述细粒度的并行方法,使用了CUDA流。对分片数对层间并行性的影响进行了数学分析,并对CUDA流数对层内并行性的影响进行了分析。所提出的方法是基于Caffe实现的。一些代表性的数据集(包括CIFAR100和ImageNet)被用于实验。评估结果表明,它可以帮助Caffe实现显着的加速,这对大数据分析非常有意义。 (C)2019 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号