A Bi-layered Parallel Training Architecture for Large-Scale Convolutional Neural Networks

Chen Jianguo; Li Kenli; Bilal Kashif; Zhou Xu; Li Keqin; Yu Philip S.

首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >A Bi-layered Parallel Training Architecture for Large-Scale Convolutional Neural Networks

【24h】

A Bi-layered Parallel Training Architecture for Large-Scale Convolutional Neural Networks

机译：用于大型卷积神经网络的双层平行训练架构

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Benefitting fromlarge-scale training datasets and the complex training network, Convolutional Neural Networks (CNNs) are widely applied in various fields with high accuracy. However, the training process of CNNs is very time-consuming, where large amounts of training samples and iterative operations are required to obtain high-quality weight parameters. In this paper, we focus on the time-consuming training process of large-scale CNNs and propose a Bi-layered Parallel Training (BPT-CNN) architecture in distributed computing environments. BPT-CNN consists of two main components: (a) an outer-layer parallel training for multiple CNN subnetworks on separate data subsets, and (b) an inner-layer parallel training for each subnetwork. In the outer-layer parallelism, we address critical issues of distributed and parallel computing, including data communication, synchronization, and workload balance. A heterogeneous-aware Incremental Data Partitioning and Allocation (IDPA) strategy is proposed, where large-scale training datasets are partitioned and allocated to the computing nodes in batches according to their computing power. To minimize the synchronization waiting during the global weight update process, an Asynchronous Global Weight Update (AGWU) strategy is proposed. In the inner-layer parallelism, we further accelerate the training process for each CNN subnetwork on each computer, where computation steps of convolutional layer and the local weight training are parallelized based on task-parallelism. We introduce task decomposition and scheduling strategies with the objectives of thread-level load balancing and minimum waiting time for critical paths. Extensive experimental results indicate that the proposed BPT-CNN effectively improves the training performance of CNNs while maintaining the accuracy.

机译：受益于尺度培训数据集和复杂的培训网络，卷积神经网络（CNNS）广泛应用于高精度的各个领域。然而，CNN的训练过程非常耗时，其中需要大量的训练样本和迭代操作来获得高质量的重量参数。在本文中，我们专注于大规模CNN的耗时训练过程，并提出了分布式计算环境中的双层并行训练（BPT-CNN）架构。 BPT-CNN由两个主要组件组成：（a）单独的数据子集上的多个CNN子网的外层并行训练，（b）每个子网的内层并行训练。在外层并行性中，我们解决了分布式和并行计算的关键问题，包括数据通信，同步和工作负载平衡。提出了一种异构感知的增量数据分区和分配（IDPA）策略，其中根据其计算功率将大规模训练数据集分区并分配给计算节点。为了最小化全局权力更新过程期间的同步等待，提出了一种异步全局权重更新（AGWU）策略。在内层并行性中，我们进一步加速了每台计算机上的每个CNN子网的训练过程，其中卷积层的计算步骤和局部权重训练基于任务并行性并行化。我们介绍了任务分解和调度策略，目的是线程级负载平衡和关键路径的最小等待时间。广泛的实验结果表明，所提出的BPT-CNN有效提高了CNNS的训练性能，同时保持了准确性。

著录项

来源
《IEEE Transactions on Parallel and Distributed Systems》 |2019年第5期|965-976|共12页
作者
Chen Jianguo; Li Kenli; Bilal Kashif; Zhou Xu; Li Keqin; Yu Philip S.;
展开▼
作者单位

Hunan Univ Coll Comp Sci & Elect Engn Changsha 410006 Hunan Peoples R China|Natl Supercomp Ctr Changsha 410082 Hunan Peoples R China;

Hunan Univ Coll Comp Sci & Elect Engn Changsha 410006 Hunan Peoples R China|Natl Supercomp Ctr Changsha 410082 Hunan Peoples R China;

COMSATS Univ Islamabad Abbottabad 45550 Pakistan|Qatar Univ Doha 2713 Qatar;

Hunan Univ Coll Comp Sci & Elect Engn Changsha 410006 Hunan Peoples R China|Natl Supercomp Ctr Changsha 410082 Hunan Peoples R China;

Hunan Univ Coll Comp Sci & Elect Engn Changsha 410006 Hunan Peoples R China|Natl Supercomp Ctr Changsha 410082 Hunan Peoples R China|SUNY Coll New Paltz Dept Comp Sci New Paltz NY 12561 USA;

Univ Illinois Dept Comp Sci Chicago IL 60607 USA|Tsinghua Univ Inst Data Sci Beijing 100084 Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Big data; bi-layered parallel computing; convolutional neural networks; deep learning; distributed computing;

机译：大数据;双层平行计算;卷积神经网络;深度学习;分布式计算;

相似文献

外文文献
中文文献
专利

1. A Bi-layered Parallel Training Architecture for Large-Scale Convolutional Neural Networks [J] . Chen Jianguo, Li Kenli, Bilal Kashif, IEEE Transactions on Parallel and Distributed Systems . 2019,第5期

机译：大规模卷积神经网络的双层并行训练架构
2. HGP4CNN: an efficient parallelization framework for training convolutional neural networks on modern GPUs [J] . Fu Hao, Tang Shanjiang, He Bingsheng, Journal of supercomputing . 2021,第11期

机译：HGP4CNN：用于培训现代GPU的卷积神经网络的有效平行化框架
3. CHAOS: a parallelization scheme for training convolutional neural networks on Intel Xeon Phi [J] . Viebke Andre, Memeti Suejb, Pllana Sabri, Journal of supercomputing . 2019,第1期

机译：CHAOS：在英特尔至强融核上训练卷积神经网络的并行化方案
4. Parallelizing Convolutional Neural Networks on Intel~® Many Integrated Core Architecture [C] . Junjie Liu, Haixia Wang, Dongsheng Wang, International conference on architecture of computing systems . 2015

机译：基于Intel〜®Many Integrated Core Architecture的并行卷积神经网络
5. Effects of the usage of parallel hardware architectures in the simulation of artificial neural networks training process. [D] . Beas, Carlos Guillermo. 2011

机译：并行硬件体系结构的使用对仿真神经网络训练过程的影响。
6. PDCOVIDNet: a parallel-dilated convolutional neural network architecture for detecting COVID-19 from chest X-ray images [O] . Nihad K. Chowdhury, Md. Muhtadir Rahman, Muhammad Ashad Kabir 2020

机译：PDCOVIDNET：一个平行扩张的卷积神经网络架构用于检测来自胸部X射线图像的Covid-19
7. CHAOS : A Parallelization Scheme for Training Convolutional Neural Networks on Intel Xeon Phi [O] . Viebke, Andre, Memeti, Suejb, Pllana, Sabri, 2017

机译：CHAOS：在英特尔至强融核上训练卷积神经网络的并行化方案

A Bi-layered Parallel Training Architecture for Large-Scale Convolutional Neural Networks

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅