NeuronLink: An Efficient Chip-to-Chip Interconnect for Large-Scale Neural Network Accelerators

Xiao Shanlin; Guo Yuhao; Liao Wenkang; Deng Huipeng; Luo Yi; Zheng Huanliang; Wang Jian; Li Cheng; Li Gezi; Yu Zhiyi

首页> 外文期刊>IEEE transactions on very large scale integration (VLSI) systems >NeuronLink: An Efficient Chip-to-Chip Interconnect for Large-Scale Neural Network Accelerators

【24h】

NeuronLink: An Efficient Chip-to-Chip Interconnect for Large-Scale Neural Network Accelerators

机译：NeuronLink：用于大型神经网络加速器的高效芯片互连

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Large-scale neural network (NN) accelerators typically consist of several processing nodes, which could be implemented as a multi- or many-core chip and organized via a network-on-chip (NoC) to handle the heavy neuron-to-neuron traffic. Multiple NoC-based NN chips are connected through chip-to-chip interconnection networks to further boost the overall neural acceleration capability. Huge amounts of multicast-based traffic travel on-chip or cross chips, making the interconnection network design more challenging and become the bottleneck of the NN system performance and energy. In this article, we propose coupling intrachip and interchip communication techniques, called NeuronLink, for NN accelerators. Regarding the intrachip communication, we propose scoring crossbar arbitration, arbitration interception, and route computation parallelization techniques for virtual-channel routing, leading to a high-throughput NoC with a lower hardware cost for multicast-based traffic. Regarding the interchip communication, we propose a lightweight and NoC-aware chip-to-chip interconnection scheme, enabling efficient interconnection for NoC-based NN chips. In addition, we evaluate the proposed techniques on a four connected NoC-based deep neural network (DNN) chips with four field-programmable gate arrays (FPGAs). The experimental results show that the proposed interconnection network can efficiently manage the data traffic inside DNNs with high-throughput and low-overhead against state-of-the-art interconnects.

机译：大规模神经网络（NN）加速器通常由若干处理节点组成，可以实现为多或多核芯片，并通过片内（NOC）组织以处理重型神经元到神经元交通。基于NN的NN芯片通过芯片到芯片互连网络连接，以进一步提高整体神经加速能力。大量组播基于组播的流量旅行片或交叉芯片，使互连网络设计更具挑战性，成为NN系统性能和能量的瓶颈。在本文中，我们提出了NN加速器的耦合耦合intrachip和Interchip通信技术，称为Neuronlink。关于intachip通信，我们提出了用于虚拟通道路由的横杆仲裁，仲裁拦截和路由计算并行化技术，导致具有较低的基于组播业务的硬件成本的高吞吐量NOC。关于Interchip通信，我们提出了一种轻量级和NOC感知的芯片到芯片互连方案，使基于NOC的NN芯片能够有效地互连。此外，我们评估了具有四个现场可编程门阵列（FPGA）的四个连接的基于NOC的深神经网络（DNN）芯片上的所提出的技术。实验结果表明，所提出的互连网络可以有效地管理DNN内的数据流量，以高吞吐量和用于最先进的互连的低开销。

著录项

来源
《IEEE transactions on very large scale integration (VLSI) systems》 |2020年第9期|1966-1978|共13页
作者
Xiao Shanlin; Guo Yuhao; Liao Wenkang; Deng Huipeng; Luo Yi; Zheng Huanliang; Wang Jian; Li Cheng; Li Gezi; Yu Zhiyi;
展开▼
作者单位

Sun Yat Sen Univ Sch Elect & Informat Technol Guangzhou 510006 Peoples R China;

Sun Yat Sen Univ Sch Elect & Informat Technol Guangzhou 510006 Peoples R China;

Sun Yat Sen Univ Sch Elect & Informat Technol Guangzhou 510006 Peoples R China;

Sun Yat Sen Univ Sch Elect & Informat Technol Guangzhou 510006 Peoples R China;

Sun Yat Sen Univ Sch Elect & Informat Technol Guangzhou 510006 Peoples R China;

Sun Yat Sen Univ Sch Elect & Informat Technol Guangzhou 510006 Peoples R China;

Sun Yat Sen Univ Sch Elect & Informat Technol Guangzhou 510006 Peoples R China;

Huawei Technol Co Ltd Hangzhou 310051 Peoples R China;

Huawei Technol Co Ltd Hangzhou 310051 Peoples R China;

Sun Yat Sen Univ Sch Elect & Informat Technol Guangzhou 510006 Peoples R China|Sun Yat Sen Univ Sch Microelect Sci & Technol Zhuhai 519082 Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Artificial neural networks; Neurons; Multiprocessor interconnection; Acceleration; System-on-chip; Hardware; Biological neural networks; Chip-to-chip interconnection; deep neural network (DNN); hardware accelerator; interconnection architecture; network-on-chip (NoC);

机译：人工神经网络;多处理器互连;加速度;片上系统;硬件;生物神经网络;芯片到芯片互连;深神经网络（DNN）;硬件加速器;互连架构;互连架构;互连架构;互联架构;芯片架构（NOC ）;

相似文献

外文文献
中文文献
专利

1. SparseNN: A Performance-Efficient Accelerator for Large-Scale Sparse Neural Networks [J] . Yuntao Lu, Chao Wang, Lei Gong, International journal of parallel programming . 2018,第4期

机译：SparseNN：大型稀疏神经网络的高性能加速器
2. An Overview of Efficient Interconnection Networks for Deep Neural Network Accelerators [J] . Nabavinejad Seyed Morteza, Baharloo Mohammad, Chen Kun-Chih, Emerging and Selected Topics in Circuits and Systems, IEEE Journal on . 2020,第3期

机译：深度神经网络加速器有效互连网络概述
3. Leveraging the Error Resilience of Neural Networks for Designing Highly Energy Efficient Accelerators [J] . Du Zidong, Lingamneni Avinash, Chen Yunji, Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on . 2015,第8期

机译：利用神经网络的错误复原能力设计高效节能的加速器
4. Large-Scale Optical Neural-Network Accelerators based on Coherent Detection [C] . Ryan Hamerly, Alex Sludds, Liane Bernstein, Conference on Lasers and Electro-Optics . 2019

机译：基于相干检测的大型光学神经网络加速器
5. FPGA-based Accelerators for Convolutional Neural Networks on Embedded Devices [D] . Perera Miro, Jordi. 2020

机译：基于FPGA的嵌入式设备卷积神经网络的加速器
6. Bayesian Multi-objective Hyperparameter Optimization for Accurate Fast and Efficient Neural Network Accelerator Design [O] . Maryam Parsa, John P. Mitchell, Catherine D. Schuman, 2020

机译：贝叶斯多目标封路计优化准确快速高效的神经网络加速器设计
7. Automated optimization for memory‐efficient high‐performance deep neural network accelerators [O] . HyunMi Kim, Chun‐Gi Lyuh, Youngsu Kwon 2020

机译：内存高效的高性能深度神经网络加速器的自动优化

NeuronLink: An Efficient Chip-to-Chip Interconnect for Large-Scale Neural Network Accelerators

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅