首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >A Bi-layered Parallel Training Architecture for Large-Scale Convolutional Neural Networks
【24h】

A Bi-layered Parallel Training Architecture for Large-Scale Convolutional Neural Networks

机译:用于大型卷积神经网络的双层平行训练架构

获取原文
获取原文并翻译 | 示例

摘要

Benefitting fromlarge-scale training datasets and the complex training network, Convolutional Neural Networks (CNNs) are widely applied in various fields with high accuracy. However, the training process of CNNs is very time-consuming, where large amounts of training samples and iterative operations are required to obtain high-quality weight parameters. In this paper, we focus on the time-consuming training process of large-scale CNNs and propose a Bi-layered Parallel Training (BPT-CNN) architecture in distributed computing environments. BPT-CNN consists of two main components: (a) an outer-layer parallel training for multiple CNN subnetworks on separate data subsets, and (b) an inner-layer parallel training for each subnetwork. In the outer-layer parallelism, we address critical issues of distributed and parallel computing, including data communication, synchronization, and workload balance. A heterogeneous-aware Incremental Data Partitioning and Allocation (IDPA) strategy is proposed, where large-scale training datasets are partitioned and allocated to the computing nodes in batches according to their computing power. To minimize the synchronization waiting during the global weight update process, an Asynchronous Global Weight Update (AGWU) strategy is proposed. In the inner-layer parallelism, we further accelerate the training process for each CNN subnetwork on each computer, where computation steps of convolutional layer and the local weight training are parallelized based on task-parallelism. We introduce task decomposition and scheduling strategies with the objectives of thread-level load balancing and minimum waiting time for critical paths. Extensive experimental results indicate that the proposed BPT-CNN effectively improves the training performance of CNNs while maintaining the accuracy.
机译:受益于尺度培训数据集和复杂的培训网络,卷积神经网络(CNNS)广泛应用于高精度的各个领域。然而,CNN的训练过程非常耗时,其中需要大量的训练样本和迭代操作来获得高质量的重量参数。在本文中,我们专注于大规模CNN的耗时训练过程,并提出了分布式计算环境中的双层并行训练(BPT-CNN)架构。 BPT-CNN由两个主要组件组成:(a)单独的数据子集上的多个CNN子网的外层并行训练,(b)每个子网的内层并行训练。在外层并行性中,我们解决了分布式和并行计算的关键问题,包括数据通信,同步和工作负载平衡。提出了一种异构感知的增量数据分区和分配(IDPA)策略,其中根据其计算功率将大规模训练数据集分区并分配给计算节点。为了最小化全局权力更新过程期间的同步等待,提出了一种异步全局权重更新(AGWU)策略。在内层并行性中,我们进一步加速了每台计算机上的每个CNN子网的训练过程,其中卷积层的计算步骤和局部权重训练基于任务并行性并行化。我们介绍了任务分解和调度策略,目的是线程级负载平衡和关键路径的最小等待时间。广泛的实验结果表明,所提出的BPT-CNN有效提高了CNNS的训练性能,同时保持了准确性。

著录项

  • 来源
  • 作者单位

    Hunan Univ Coll Comp Sci & Elect Engn Changsha 410006 Hunan Peoples R China|Natl Supercomp Ctr Changsha 410082 Hunan Peoples R China;

    Hunan Univ Coll Comp Sci & Elect Engn Changsha 410006 Hunan Peoples R China|Natl Supercomp Ctr Changsha 410082 Hunan Peoples R China;

    COMSATS Univ Islamabad Abbottabad 45550 Pakistan|Qatar Univ Doha 2713 Qatar;

    Hunan Univ Coll Comp Sci & Elect Engn Changsha 410006 Hunan Peoples R China|Natl Supercomp Ctr Changsha 410082 Hunan Peoples R China;

    Hunan Univ Coll Comp Sci & Elect Engn Changsha 410006 Hunan Peoples R China|Natl Supercomp Ctr Changsha 410082 Hunan Peoples R China|SUNY Coll New Paltz Dept Comp Sci New Paltz NY 12561 USA;

    Univ Illinois Dept Comp Sci Chicago IL 60607 USA|Tsinghua Univ Inst Data Sci Beijing 100084 Peoples R China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Big data; bi-layered parallel computing; convolutional neural networks; deep learning; distributed computing;

    机译:大数据;双层平行计算;卷积神经网络;深度学习;分布式计算;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号