A Bi-layered Parallel Training Architecture for Large-Scale Convolutional Neural Networks

Chen Jianguo; Li Kenli; Bilal Kashif; Zhou Xu; Li Keqin; Yu Philip S.

首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >A Bi-layered Parallel Training Architecture for Large-Scale Convolutional Neural Networks

【24h】

A Bi-layered Parallel Training Architecture for Large-Scale Convolutional Neural Networks

机译：大规模卷积神经网络的双层并行训练架构

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Benefitting fromlarge-scale training datasets and the complex training network, Convolutional Neural Networks (CNNs) are widely applied in various fields with high accuracy. However, the training process of CNNs is very time-consuming, where large amounts of training samples and iterative operations are required to obtain high-quality weight parameters. In this paper, we focus on the time-consuming training process of large-scale CNNs and propose a Bi-layered Parallel Training (BPT-CNN) architecture in distributed computing environments. BPT-CNN consists of two main components: (a) an outer-layer parallel training for multiple CNN subnetworks on separate data subsets, and (b) an inner-layer parallel training for each subnetwork. In the outer-layer parallelism, we address critical issues of distributed and parallel computing, including data communication, synchronization, and workload balance. A heterogeneous-aware Incremental Data Partitioning and Allocation (IDPA) strategy is proposed, where large-scale training datasets are partitioned and allocated to the computing nodes in batches according to their computing power. To minimize the synchronization waiting during the global weight update process, an Asynchronous Global Weight Update (AGWU) strategy is proposed. In the inner-layer parallelism, we further accelerate the training process for each CNN subnetwork on each computer, where computation steps of convolutional layer and the local weight training are parallelized based on task-parallelism. We introduce task decomposition and scheduling strategies with the objectives of thread-level load balancing and minimum waiting time for critical paths. Extensive experimental results indicate that the proposed BPT-CNN effectively improves the training performance of CNNs while maintaining the accuracy.

机译：得益于大规模训练数据集和复杂的训练网络，卷积神经网络（CNN）广泛地以高精度应用于各个领域。但是，CNN的训练过程非常耗时，需要大量的训练样本和迭代操作才能获得高质量的权重参数。在本文中，我们着眼于大型CNN的耗时训练过程，并提出了分布式计算环境中的双层并行训练（BPT-CNN）体系结构。 BPT-CNN由两个主要组件组成：（a）在单独的数据子集上对多个CNN子网的外层并行训练，以及（b）对每个子网的内层并行训练。在外层并行性中，我们解决了分布式和并行计算的关键问题，包括数据通信，同步和工作负载平衡。提出了一种异构感知增量数据分配与分配策略，该算法将大规模训练数据集按计算能力进行批量分配并分配给计算节点。为了最小化全局权重更新过程中的同步等待，提出了一种异步全局权重更新（AGWU）策略。在内层并行性中，我们进一步加快了每台计算机上每个CNN子网的训练过程，其中基于任务并行性对卷积层的计算步骤和局部权重训练进行并行化。我们以线程级负载平衡和关键路径的最小等待时间为目标，介绍了任务分解和调度策略。大量的实验结果表明，提出的BPT-CNN在保持精度的同时，有效地提高了CNN的训练性能。

著录项

来源
《IEEE Transactions on Parallel and Distributed Systems》 |2019年第5期|965-976|共12页
作者
Chen Jianguo; Li Kenli; Bilal Kashif; Zhou Xu; Li Keqin; Yu Philip S.;
展开▼
作者单位

Hunan Univ, Coll Comp Sci & Elect Engn, Changsha 410006, Hunan, Peoples R China|Natl Supercomp Ctr, Changsha 410082, Hunan, Peoples R China;

Hunan Univ, Coll Comp Sci & Elect Engn, Changsha 410006, Hunan, Peoples R China|Natl Supercomp Ctr, Changsha 410082, Hunan, Peoples R China;

COMSATS Univ Islamabad, Abbottabad 45550, Pakistan|Qatar Univ, Doha 2713, Qatar;

Hunan Univ, Coll Comp Sci & Elect Engn, Changsha 410006, Hunan, Peoples R China|Natl Supercomp Ctr, Changsha 410082, Hunan, Peoples R China;

Hunan Univ, Coll Comp Sci & Elect Engn, Changsha 410006, Hunan, Peoples R China|Natl Supercomp Ctr, Changsha 410082, Hunan, Peoples R China|SUNY Coll New Paltz, Dept Comp Sci, New Paltz, NY 12561 USA;

Univ Illinois, Dept Comp Sci, Chicago, IL 60607 USA|Tsinghua Univ, Inst Data Sci, Beijing 100084, Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Big data; bi-layered parallel computing; convolutional neural networks; deep learning; distributed computing;

机译：大数据;双层平行计算;卷积神经网络;深度学习;分布式计算;

相似文献

外文文献
中文文献
专利

1. A Bi-layered Parallel Training Architecture for Large-Scale Convolutional Neural Networks [J] . Chen Jianguo, Li Kenli, Bilal Kashif, IEEE Transactions on Parallel and Distributed Systems . 2019,第5期

机译：用于大型卷积神经网络的双层平行训练架构
2. HGP4CNN: an efficient parallelization framework for training convolutional neural networks on modern GPUs [J] . Fu Hao, Tang Shanjiang, He Bingsheng, Journal of supercomputing . 2021,第11期

机译：HGP4CNN：用于培训现代GPU的卷积神经网络的有效平行化框架
3. CHAOS: a parallelization scheme for training convolutional neural networks on Intel Xeon Phi [J] . Viebke Andre, Memeti Suejb, Pllana Sabri, Journal of supercomputing . 2019,第1期

机译：CHAOS：在英特尔至强融核上训练卷积神经网络的并行化方案
4. Parallelizing Convolutional Neural Networks on Intel~® Many Integrated Core Architecture [C] . Junjie Liu, Haixia Wang, Dongsheng Wang, International conference on architecture of computing systems . 2015

机译：基于Intel〜®Many Integrated Core Architecture的并行卷积神经网络
5. Effects of the usage of parallel hardware architectures in the simulation of artificial neural networks training process. [D] . Beas, Carlos Guillermo. 2011

机译：并行硬件体系结构的使用对仿真神经网络训练过程的影响。
6. PDCOVIDNet: a parallel-dilated convolutional neural network architecture for detecting COVID-19 from chest X-ray images [O] . Nihad K. Chowdhury, Md. Muhtadir Rahman, Muhammad Ashad Kabir 2020

机译：PDCOVIDNET：一个平行扩张的卷积神经网络架构用于检测来自胸部X射线图像的Covid-19
7. CHAOS : A Parallelization Scheme for Training Convolutional Neural Networks on Intel Xeon Phi [O] . Viebke, Andre, Memeti, Suejb, Pllana, Sabri, 2017

机译：CHAOS：在英特尔至强融核上训练卷积神经网络的并行化方案

A Bi-layered Parallel Training Architecture for Large-Scale Convolutional Neural Networks

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅