首页> 外文期刊>Fortschritte der Physik >On-Chip Communication Network for Efficient Training of Deep Convolutional Networks on Heterogeneous Manycore Systems
【24h】

On-Chip Communication Network for Efficient Training of Deep Convolutional Networks on Heterogeneous Manycore Systems

机译:用于在异构多核系统中高效训练的芯片通信网络

获取原文
获取原文并翻译 | 示例
           

摘要

Convolutional Neural Networks (CNNs) have shown a great deal of success in diverse application domains including computer vision, speech recognition, and natural language processing. However, as the size of datasets and the depth of neural network architectures continue to grow, it is imperative to design high-performance and energy-efficient computing hardware for training CNNs. In this paper, we consider the problem of designing specialized CPU-GPU based heterogeneous manycore systems for energy-efficient training of CNNs. It has already been shown that the typical on-chip communication infrastructures employed in conventional CPU-GPU based heterogeneous manycore platforms are unable to handle both CPU and GPU communication requirements efficiently. To address this issue, we first analyze the on-chip traffic patterns that arise from the computational processes associated with training two deep CNN architectures, namely, LeNet and CDBNet, to perform image classification. By leveraging this knowledge, we design a hybrid Network-on-Chip (NoC) architecture, which consists of both wireline and wireless links, to improve the performance of CPU-GPU based heterogeneous manycore platforms running the above-mentioned CNN training workloads. The proposed NoC achieves 1.8x reduction in network latency and improves the network throughput by a factor of 2.2 for training CNNs, when compared to a highly-optimized wireline mesh NoC. For the considered CNN workloads, these network-level improvements translate into 25 percent savings in full-system energy-delay-product (EDP). This demonstrates that the proposed hybrid NoC for heterogeneous manycore architectures is capable of significantly accelerating training of CNNs while remaining energy-efficient.
机译:卷积神经网络(CNNS)在包括计算机视觉,语音识别和自然语言处理的不同应用领域中显示出大量成功。然而,随着数据集的大小和神经网络架构的深度继续增长,必须设计用于训练CNN的高性能和节能计算硬件。在本文中,我们考虑了设计专用CPU-GPU的异构多核系统的问题,以节能培训CNN。已经表明,基于CPU-GPU的异构多核平台中采用的典型的片上通信基础设施无法有效地处理CPU和GPU通信需求。为了解决这个问题,我们首先分析与训练两个深CNN架构,即Lenet和CDBNET相关联的计算过程中出现的片上流量模式,以执行图像分类。通过利用这些知识,我们设计了一个混合网片(NOC)架构,该架构由有线和无线链路组成,提高基于CPU-GPU的异构MDORE平台的性能,运行上述CNN训练工作负载。建议的NOC实现了1.8倍的网络延迟减少,并将网络吞吐量提高了2.2因子,用于训练CNN,与高度优化的有线网格NOC相比。对于被考虑的CNN工作负载,这些网络级改进转化为全系统能源延迟 - 产品(EDP)节省25%。这表明,所提出的非均相多核架构的混合NOC能够显着加速CNN的训练,同时保持节能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号