Enabling Highly Efficient Capsule Networks Processing Through Software-Hardware Co-Design

Zhang Xingyao; Fu Xin; Zhuang Donglin; Xie Chenhao; Song Shuaiwen Leon

首页> 外文期刊>IEEE Transactions on Computers >Enabling Highly Efficient Capsule Networks Processing Through Software-Hardware Co-Design

【24h】

Enabling Highly Efficient Capsule Networks Processing Through Software-Hardware Co-Design

机译：通过软件 - 硬件共同设计实现高效的胶囊网络处理

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

As the demand for the image processing increases, the image features become increasingly complicated. Although the Convolutional Neural Network (CNN) have been widely adopted for the imaging processing tasks, it has been found easily misled due to the massive usage of pooling operations. A novel neural network structure called Capsule Networks (CapsNet) is proposed to address the CNN challenge and essentially enhance the learning ability for the image segmentation and object detection. Since the CapsNet contains the high volume of the matrix execution, it has been generally accelerated on modern GPU platforms with the highly optimized deep-learning library. However, the routing procedure of CapsNet introduces the special program and execution features,including massive unshareable intermediate variables and intensive synchronizations, causing inefficient CapsNet execution on modern GPU. To address these challenges, we propose the software-hardware co-designed optimizations, SH-CapsNet, which includes the software-level optimizations named S-CapsNet and a hybrid computing architecture design named PIM-CapsNet. In software-level, S-CapsNet reduces the computation and memory accesses by exploiting the computational redundancy and data similarity of the routing procedure. In hardware-level, the PIM-CapsNet leverages the processing-in-memory capability of today's 3D stacked memory to conduct the off-chip in-memory acceleration solution for the routing procedure, while pipelining with the GPU's on-chip computing capability for accelerating CNN types of layers in CapsNet. Evaluation results demonstrate that either our software or hardware optimizations can significantly improve the CapsNet execution efficiency. Together, our co-design can achieve greatly improvement on both performance ($3.41imes$3.41x) and energy savings (68.72 percent) for CapsNet inference, with negligible accuracy loss.

机译：随着对图像处理的需求增加，图像特征变得越来越复杂。虽然卷积神经网络（CNN）被广泛用于成像处理任务，但由于汇集操作的大规模使用而被发现容易误导。提出了一种名为胶囊网络（CAPSNET）的新型神经网络结构，以解决CNN挑战，基本上提高图像分割和对象检测的学习能力。由于Capsnet包含大量的矩阵执行，因此通常在具有高度优化的深度学习库的现代GPU平台上加速了它。但是，CAPSNET的路由程序引入了特殊的程序和执行功能，包括大规模的不可公行的中间变量和密集同步，导致现代GPU上的低效载波执行。为解决这些挑战，我们提出了软件 - 硬件共同设计的优化SH-CapsNet，其中包括名为S-CapsNet的软件级优化以及名为PIM-CapsNet的混合计算架构设计。在软件级中，S-CAPSNET通过利用路由过程的计算冗余和数据相似性来减少计算和内存访问。在硬件级别中，PIM-CAPSNET利用当今3D堆叠内存的加工内存能力，为路由过程进行外部内存加速度解决方案，同时通过GPU的片上计算能力进行加速，以便加速CNN类型的CAPSNET中的类型。评估结果表明，我们的软件或硬件优化可以显着提高载波的执行效率。我们的共同设计在一起可以大大改善性能（3.41倍3.41倍）和节能（68.72％）用于帽子推断，可忽略不计的准确性损失。

著录项

来源
《IEEE Transactions on Computers》 |2021年第4期|495-510|共16页
作者
Zhang Xingyao; Fu Xin; Zhuang Donglin; Xie Chenhao; Song Shuaiwen Leon;
展开▼
作者单位

Univ Houston Dept Elect & Comp Engn Houston TX 77004 USA;

Univ Houston Dept Elect & Comp Engn Houston TX 77004 USA;

Univ Sydney Future Syst Architecture FSA Lab Sydney NSW 2006 Australia;

Pacific Northwest Natl Lab PNNL Richland WA 99354 USA;

Univ Sydney Future Syst Architecture FSA Lab Sydney NSW 2006 Australia;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Accelerators; domain-specific architectures; machine learning; emerging technologies;

机译：加速器;具体域的架构;机器学习;新兴技术;

相似文献

外文文献
中文文献
专利

1. Enabling room-temperature processed highly efficient and stable 2D Ruddlesden-Popper perovskite solar cells with eliminated hysteresis by synergistic exploitation of additives and solvents (vol 7, pg 2015, 2019) [J] . Yu Shuang, Yan Yajie, Chen Yani, Journal of Materials Chemistry, A. Materials for energy and sustainability . 2019,第18期

机译：使房间 - 温度加工高效稳定的2D Ruddlesden-popperskite太阳能电池，通过协同剥削添加剂和溶剂（Vol 7，PG 2015,2019），消除了滞后的滞后
2. Enabling room-temperature processed highly efficient and stable 2D Ruddlesden-Popper perovskite solar cells with eliminated hysteresis by synergistic exploitation of additives and solvents [J] . Yu Shuang, Yan Yajie, Chen Yani, Journal of Materials Chemistry, A. Materials for energy and sustainability . 2019,第5期

机译：使房间 - 温度加工高效稳定的2D ruddlesden-popperskite太阳能电池，通过协同利用添加剂和溶剂来消除滞后
3. Enabling Highly Efficient k-Means Computations on the SW26010 Many-Core Processor of Sunway TaihuLight [J] . Min Li, Chao Yang, Qiao Sun, 计算机科学技术学报（英文版） . 2019,第001期

机译：在Sunway TaihuLight的SW26010多核处理器上启用高效的k均值计算
4. Enabling Highly Efficient Capsule Networks Processing Through A PIM-Based Architecture Design [C] . Xingyao Zhang, Shuaiwen Leon Song, Chenhao Xie, IEEE International Symposium on High Performance Computer Architecture . 2020

机译：通过基于PIM的体系结构设计实现高效的胶囊网络处理
5. Optical signal processing for enabling high-speed, highly spectrally efficient and high capacity optical systems. [D] . Fazal, Muhammad Irfan. 2012

机译：光信号处理，用于实现高速，高光谱效率和高容量的光学系统。
6. Towards an Efficient CNN Inference Architecture Enabling In-Sensor Processing [O] . Md Jubaer Hossain Pantho, Pankaj Bhowmik, Christophe Bobda 2021

机译：迈向有效的CNN推理架构实现了传感器处理
7. Enabling Highly Efficient Capsule Networks Processing Through A PIM-Based Architecture Design [O] . Xingyao Zhang, Shuaiwen Leon Song, Chenhao Xie, 2020

机译：通过PIM的架构设计实现高效的胶囊网络处理

Enabling Highly Efficient Capsule Networks Processing Through Software-Hardware Co-Design

摘要

著录项

相似文献

相关主题

期刊订阅