On-Chip Communication Network for Efficient Training of Deep Convolutional Networks on Heterogeneous Manycore Systems

Choi Wonje; Duraisamy Karthi; Kim Ryan Gary; Doppa Janardhan Rao; Pande Partha Pratim; Marculescu Diana; Marculescu Radu

首页> 外文期刊>Fortschritte der Physik >On-Chip Communication Network for Efficient Training of Deep Convolutional Networks on Heterogeneous Manycore Systems

【24h】

On-Chip Communication Network for Efficient Training of Deep Convolutional Networks on Heterogeneous Manycore Systems

机译：用于在异构多核系统中高效训练的芯片通信网络

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Convolutional Neural Networks (CNNs) have shown a great deal of success in diverse application domains including computer vision, speech recognition, and natural language processing. However, as the size of datasets and the depth of neural network architectures continue to grow, it is imperative to design high-performance and energy-efficient computing hardware for training CNNs. In this paper, we consider the problem of designing specialized CPU-GPU based heterogeneous manycore systems for energy-efficient training of CNNs. It has already been shown that the typical on-chip communication infrastructures employed in conventional CPU-GPU based heterogeneous manycore platforms are unable to handle both CPU and GPU communication requirements efficiently. To address this issue, we first analyze the on-chip traffic patterns that arise from the computational processes associated with training two deep CNN architectures, namely, LeNet and CDBNet, to perform image classification. By leveraging this knowledge, we design a hybrid Network-on-Chip (NoC) architecture, which consists of both wireline and wireless links, to improve the performance of CPU-GPU based heterogeneous manycore platforms running the above-mentioned CNN training workloads. The proposed NoC achieves 1.8x reduction in network latency and improves the network throughput by a factor of 2.2 for training CNNs, when compared to a highly-optimized wireline mesh NoC. For the considered CNN workloads, these network-level improvements translate into 25 percent savings in full-system energy-delay-product (EDP). This demonstrates that the proposed hybrid NoC for heterogeneous manycore architectures is capable of significantly accelerating training of CNNs while remaining energy-efficient.

机译：卷积神经网络（CNNS）在包括计算机视觉，语音识别和自然语言处理的不同应用领域中显示出大量成功。然而，随着数据集的大小和神经网络架构的深度继续增长，必须设计用于训练CNN的高性能和节能计算硬件。在本文中，我们考虑了设计专用CPU-GPU的异构多核系统的问题，以节能培训CNN。已经表明，基于CPU-GPU的异构多核平台中采用的典型的片上通信基础设施无法有效地处理CPU和GPU通信需求。为了解决这个问题，我们首先分析与训练两个深CNN架构，即Lenet和CDBNET相关联的计算过程中出现的片上流量模式，以执行图像分类。通过利用这些知识，我们设计了一个混合网片（NOC）架构，该架构由有线和无线链路组成，提高基于CPU-GPU的异构MDORE平台的性能，运行上述CNN训练工作负载。建议的NOC实现了1.8倍的网络延迟减少，并将网络吞吐量提高了2.2因子，用于训练CNN，与高度优化的有线网格NOC相比。对于被考虑的CNN工作负载，这些网络级改进转化为全系统能源延迟 - 产品（EDP）节省25％。这表明，所提出的非均相多核架构的混合NOC能够显着加速CNN的训练，同时保持节能。

著录项

来源
《Fortschritte der Physik》 |2018年第5期|共15页
作者
Choi Wonje; Duraisamy Karthi; Kim Ryan Gary; Doppa Janardhan Rao; Pande Partha Pratim; Marculescu Diana; Marculescu Radu;
展开▼
作者单位

Washington State Univ Elect Engn &

Comp Sci Pullman WA 99164 USA;

Washington State Univ Elect Engn &

Comp Sci Pullman WA 99164 USA;

Carnegie Mellon Univ ECE Pittsburgh PA 15213 USA;

Washington State Univ Elect Engn &

Comp Sci Pullman WA 99164 USA;

Washington State Univ Elect Engn &

Comp Sci Pullman WA 99164 USA;

Carnegie Mellon Univ ECE Pittsburgh PA 15213 USA;

Carnegie Mellon Univ ECE Pittsburgh PA 15213 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类物理学;
关键词
System-on-chip; Deep learning; Manycore systems; Wireless communication; Energy-efficient computing; Heterogeneous Architectures; Network-on-Chip;

机译：片上系统;深度学习;多核系统;无线通信;节能计算;异构架构;片上网;

相似文献

外文文献
中文文献
专利

1. On-Chip Communication Network for Efficient Training of Deep Convolutional Networks on Heterogeneous Manycore Systems [J] . Choi Wonje, Duraisamy Karthi, Kim Ryan Gary, Fortschritte der Physik . 2018,第5期

机译：用于在异构多核系统中高效训练的芯片通信网络
2. Efficient Training of Convolutional Deep Belief Networks in the Frequency Domain for Application to High-Resolution 2D and 3D Images [J] . Brosch T, Tam R Neural computation . 2015,第1期

机译：卷积深度置信网络在频域中的有效训练，可应用于高分辨率2D和3D图像
3. Comparing fully convolutional networks, random forest, support vector machine, and patch-based deep convolutional neural networks for object-based wetland mapping using images from small unmanned aircraft system [J] . Liu Tao, Abd-Elrahman Amr, Morton Jon, GIScience & remote sensing . 2018,第2期

机译：比较全卷积网络，随机森林，支持向量机和基于补丁的深度卷积神经网络，使用来自小型无人机系统的图像进行基于对象的湿地映射
4. Communication-Efficient Parallelization Strategy for Deep Convolutional Neural Network Training [C] . Sunwoo Lee, Ankit Agrawal, Prasanna Balaprakash, Proceedings of Machine Learning in HPC Environments . 2018

机译：深度卷积神经网络训练的高效通信并行化策略
5. Efficient Execution of Convolutional Neural Networks on Low Powered Heterogeneous Systems [D] . Rodrigues, Crefeda Faviola. 2020

机译：高功率异构系统上有效地执行卷积神经网络
6. Deep Learning-Based Secure MIMO Communications with Imperfect CSI for Heterogeneous Networks [O] . Dan Deng, Xingwang Li, Ming Zhao, 2020

机译：异构网络中具有不完善CSI的基于深度学习的安全MIMO通信
7. On-Chip Communication Network for Efficient Training of Deep Convolutional Networks on Heterogeneous Manycore Systems [O] . Choi, Wonje, Duraisamy, Karthi, Kim, Ryan Gary, 2017

机译：片上通信网络对深度训练的有效训练异构manycore系统上的卷积网络

On-Chip Communication Network for Efficient Training of Deep Convolutional Networks on Heterogeneous Manycore Systems

摘要

著录项

相似文献

相关主题

期刊订阅