...
首页> 外文期刊>IEEE transactions on very large scale integration (VLSI) systems >FEECA: Design Space Exploration for Low-Latency and Energy-Efficient Capsule Network Accelerators
【24h】

FEECA: Design Space Exploration for Low-Latency and Energy-Efficient Capsule Network Accelerators

机译:FEECA:低延迟和节能胶囊网络加速器的设计空间探索

获取原文
获取原文并翻译 | 示例
           

摘要

In the past few years, Capsule Networks (CapsNets) have taken the spotlight compared to traditional convolutional neural networks (CNNs) for image classification. Unlike CNNs, CapsNets have the ability to learn the spatial relationship between features of the images. However, their complexity grows because of their heterogeneous capsule structure and the dynamic routing, which is an iterative algorithm to dynamically learn the coupling coefficients of two consecutive capsule layers. This necessitates specialized hardware accelerators for CapsNets. Moreover, a high-performance and energy-efficient design of CapsNet accelerators requires exploration of different design decisions (such as the size and configuration of the processing array and the structure of the processing elements). Toward this, we make the following key contributions: 1) FEECA, a novel methodology to explore the design space of the (micro)architectural parameters of a CapsNet hardware accelerator and 2) CapsAcc, the first specialized RTL-level hardware architecture to perform CapsNets inference with high performance and high energy efficiency. Our CapsAcc achieves significant performance improvement, compared to an optimized GPU implementation, due to its efficient implementation of key activation functions, such as squash and softmax, and an efficient data reuse for the dynamic routing. The FEECA methodology employs the Non-dominated Sorting Genetic Algorithm (NSGA-II) to explore the Pareto-optimal points with respect to area, performance, and energy consumption. This requires analytical modeling of the number of clock cycles required to perform each operation of the CapsNet inference and the memory accesses to enable a fast yet accurate design space exploration. We synthesized the complete accelerator architecture in a 45-nm CMOS technology using Synopsys design tools and evaluated it for the MNIST benchmark (as done by the original CapsNet paper from Google Brain's team) and for a more complex data set, the German Traffic Sign Recognition Benchmark (GTSRB).
机译:在过去的几年里,与传统的卷积神经网络(CNNS)相比,胶囊网络(Capsnets)占据了图像分类的相比。与CNN不同,CAPSNET具有学习图像特征之间的空间关系的能力。然而,由于它们的异质胶囊结构和动态路由,它们的复杂性增大,这是一种动态地学习两个连续胶囊层的耦合系数的迭代算法。这需要用于Capsnets的专门硬件加速器。此外,Capsnet加速器的高性能和节能设计需要探索不同的设计决策(例如处理阵列的尺寸和配置以及处理元件的结构)。对此,我们提出以下关键贡献:1)FEECA,一种新型方法,探索CAPSNET硬件加速器的(MICRO)架构参数的设计空间和2)CAPSACC,第一个专门的RTL级硬件架构执行CAPSNET推理高性能和高能量效率。与优化的GPU实现相比,我们的CAPSACC实现了显着的性能改进,因为它有效地实现了Qualash和Softmax,以及动态路由的有效数据重用。 FEECA方法采用非主导的分类遗传算法(NSGA-II),以探索区域,性能和能量消耗的帕累托最优点。这需要分析模型建模,所需的时钟循环数执行载波推断的每个操作和存储器访问,以实现快速且准确的设计空间探索。我们使用Synopsys Design Tools在45 nm CMOS技术中综合了完整的加速器架构,并为Mnist基准(由Google Brain的团队的原始帽纸完成)和更复杂的数据集,德国交通标志识别基准(GTSRB)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号