...
首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >A Bi-layered Parallel Training Architecture for Large-Scale Convolutional Neural Networks
【24h】

A Bi-layered Parallel Training Architecture for Large-Scale Convolutional Neural Networks

机译:大规模卷积神经网络的双层并行训练架构

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Benefitting fromlarge-scale training datasets and the complex training network, Convolutional Neural Networks (CNNs) are widely applied in various fields with high accuracy. However, the training process of CNNs is very time-consuming, where large amounts of training samples and iterative operations are required to obtain high-quality weight parameters. In this paper, we focus on the time-consuming training process of large-scale CNNs and propose a Bi-layered Parallel Training (BPT-CNN) architecture in distributed computing environments. BPT-CNN consists of two main components: (a) an outer-layer parallel training for multiple CNN subnetworks on separate data subsets, and (b) an inner-layer parallel training for each subnetwork. In the outer-layer parallelism, we address critical issues of distributed and parallel computing, including data communication, synchronization, and workload balance. A heterogeneous-aware Incremental Data Partitioning and Allocation (IDPA) strategy is proposed, where large-scale training datasets are partitioned and allocated to the computing nodes in batches according to their computing power. To minimize the synchronization waiting during the global weight update process, an Asynchronous Global Weight Update (AGWU) strategy is proposed. In the inner-layer parallelism, we further accelerate the training process for each CNN subnetwork on each computer, where computation steps of convolutional layer and the local weight training are parallelized based on task-parallelism. We introduce task decomposition and scheduling strategies with the objectives of thread-level load balancing and minimum waiting time for critical paths. Extensive experimental results indicate that the proposed BPT-CNN effectively improves the training performance of CNNs while maintaining the accuracy.
机译:得益于大规模训练数据集和复杂的训练网络,卷积神经网络(CNN)广泛地以高精度应用于各个领域。但是,CNN的训练过程非常耗时,需要大量的训练样本和迭代操作才能获得高质量的权重参数。在本文中,我们着眼于大型CNN的耗时训练过程,并提出了分布式计算环境中的双层并行训练(BPT-CNN)体系结构。 BPT-CNN由两个主要组件组成:(a)在单独的数据子集上对多个CNN子网的外层并行训练,以及(b)对每个子网的内层并行训练。在外层并行性中,我们解决了分布式和并行计算的关键问题,包括数据通信,同步和工作负载平衡。提出了一种异构感知增量数据分配与分配策略,该算法将大规模训练数据集按计算能力进行批量分配并分配给计算节点。为了最小化全局权重更新过程中的同步等待,提出了一种异步全局权重更新(AGWU)策略。在内层并行性中,我们进一步加快了每台计算机上每个CNN子网的训练过程,其中基于任务并行性对卷积层的计算步骤和局部权重训练进行并行化。我们以线程级负载平衡和关键路径的最小等待时间为目标,介绍了任务分解和调度策略。大量的实验结果表明,提出的BPT-CNN在保持精度的同时,有效地提高了CNN的训练性能。

著录项

  • 来源
  • 作者单位

    Hunan Univ, Coll Comp Sci & Elect Engn, Changsha 410006, Hunan, Peoples R China|Natl Supercomp Ctr, Changsha 410082, Hunan, Peoples R China;

    Hunan Univ, Coll Comp Sci & Elect Engn, Changsha 410006, Hunan, Peoples R China|Natl Supercomp Ctr, Changsha 410082, Hunan, Peoples R China;

    COMSATS Univ Islamabad, Abbottabad 45550, Pakistan|Qatar Univ, Doha 2713, Qatar;

    Hunan Univ, Coll Comp Sci & Elect Engn, Changsha 410006, Hunan, Peoples R China|Natl Supercomp Ctr, Changsha 410082, Hunan, Peoples R China;

    Hunan Univ, Coll Comp Sci & Elect Engn, Changsha 410006, Hunan, Peoples R China|Natl Supercomp Ctr, Changsha 410082, Hunan, Peoples R China|SUNY Coll New Paltz, Dept Comp Sci, New Paltz, NY 12561 USA;

    Univ Illinois, Dept Comp Sci, Chicago, IL 60607 USA|Tsinghua Univ, Inst Data Sci, Beijing 100084, Peoples R China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Big data; bi-layered parallel computing; convolutional neural networks; deep learning; distributed computing;

    机译:大数据;双层平行计算;卷积神经网络;深度学习;分布式计算;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号