Distributed Learning of CNNs on Heterogeneous CPU/GPU Architectures

Marques Jose; Falcao Gabriel; Alexandre Luis A.

首页> 外文期刊>Applied Artificial Intelligence >Distributed Learning of CNNs on Heterogeneous CPU/GPU Architectures

【24h】

Distributed Learning of CNNs on Heterogeneous CPU/GPU Architectures

机译：基于异构CPU / GPU架构的CNN分布式学习

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The convolutional neural networks (CNNs) have proven to be powerful classification tools in tasks that range from check reading to medical diagnosis, reaching close to human perception, and in some cases surpassing it. However, the problems to solve are becoming larger and more complex, which translates to larger CNNs, leading to longer training times that not even the adoption of Graphics Processing Units (GPUs) could keep up to. This problem is partially solved by using more processing units and distributed training methods that are offered by several frameworks dedicated to neural network training, such as Caffe, Torch, or TensorFlow. However, these techniques do not take full advantage of the possible parallelization offered by CNNs and the cooperative use of heterogeneous devices with different processing capabilities, clock speeds, memory size, among others. This paper presents a new method for the parallel training of CNNs where only the convolutional layer is distributed. The paper analyzes the influence of network size, bandwidth, batch size, number of devices, including their processing capabilities, and other parameters. Results show that this technique is capable of diminishing the training time without affecting the classification performance for both CPUs and GPUs. For the CIFAR-10 dataset, using a CNN with two convolutional layers, and 500 and 1500 kernels, respectively, best speedups achieve 3.28 x using four CPUs and 2.45 x with three GPUs. Larger datasets will certainly require more than 60-90% of processing time calculating convolutions, and speedups will tend to increase accordingly.

机译：卷积神经网络（CNN）已被证明是功能强大的分类工具，可用于从支票阅读到医学诊断，接近人类感知甚至在某些情况下超越它的任务。但是，要解决的问题变得越来越大，越来越复杂，这意味着CNN更大，导致培训时间更长，甚至采用图形处理单元（GPU）都无法跟上。通过使用更多的处理单元和分布式训练方法（部分由专门用于神经网络训练的几种框架（例如Caffe，Torch或TensorFlow）提供）可以部分解决此问题。但是，这些技术没有充分利用CNN提供的可能的并行化功能以及具有不同处理能力，时钟速度，内存大小等的异构设备的协同使用。本文提出了一种仅对卷积层进行分布的CNN并行训练的新方法。本文分析了网络大小，带宽，批处理大小，设备数量（包括其处理能力）和其他参数的影响。结果表明，该技术能够减少训练时间，而不会影响CPU和GPU的分类性能。对于CIFAR-10数据集，分别使用具有两个卷积层和500个和1500个内核的CNN，使用四个CPU的最佳提速达到3.28倍，使用三个GPU的最佳提速达到2.45倍。较大的数据集当然需要超过60-90％的处理时间来计算卷积，并且速度会相应地提高。

著录项

来源
《Applied Artificial Intelligence》 |2018年第10期|822-844|共23页
作者
Marques Jose; Falcao Gabriel; Alexandre Luis A.;
展开▼
作者单位

Univ Coimbra, Dept Elect & Comp Engn, Inst Telecomunicacoes, Coimbra, Portugal;

Univ Beira Interior, Dept Informat, Inst Telecomunicacoes, Covilha, Portugal;

Ecole Polytech Fed Lausanne, Sch Comp & Commun Sci, Lausanne, Switzerland;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Exploring Heterogeneous NoC Design Space in Heterogeneous GPU-CPU Architectures [J] . 方娟, 姚治成, 冷镇宇, 计算机科学技术学报（英文版） . 2015,第001期

机译：在异构GPU-CPU架构中探索异构NoC设计空间
2. HPMaX: heterogeneous parallel matrix multiplication using CPUs and GPUs [J] . Kang Homin, Kwon Hyuck Chan, Kim Duksu Computing . 2020,第12期

机译：HPMAX：异构并行矩阵使用CPU和GPU乘法
3. Implementing and Evaluating an Heterogeneous, Scalable, Tridiagonal Linear System Solver with OpenCL to Target FPGAs, GPUs, and CPUs [J] . Macintosh Hamish J., Banks Jasmine E., Kelson Neil A. International journal of reconfigurable computing . 2019,第PTa1期

机译：实现和评估具有OpenCL的异构，可伸缩的Tridgonal线性系统求解器，以靶向FPGA，GPU和CPU
4. Performance evaluation of StarPU schedulers with preconditioned conjugate gradient solver on heterogeneous (multi-CPUs/multi-GPUs) architecture [C] . Najlae Kasmi, Mostapha Zbakh, Yassir Samadi, . 2017

机译：带有预配置共轭梯度求解器的StarPU调度程序在异构（多CPU /多GPU）架构上的性能评估
5. On Efficient GPGPU Computing for Integrated Heterogeneous CPU-GPU Microprocessors [D] . Gerzhoy, Daniel. 2021

机译：关于集成异构CPU-GPU微处理器的高效GPGPU计算
6. CMSA: a heterogeneous CPU/GPU computing system for multiple similar RNA/DNA sequence alignment [O] . Xi Chen, Chen Wang, Shanjiang Tang, 2017

机译：CMSA：用于多种相似RNA / DNA序列比对的异构CPU / GPU计算系统
7. Distributed learning of CNNs on heterogeneous CPU/GPU architectures [O] . Marques, Jose, Falcao, Gabriel, Alexandre, Luís A. 2017

机译：在异构CpU / GpU架构上分布式学习CNN

Distributed Learning of CNNs on Heterogeneous CPU/GPU Architectures

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅