首页> 外文会议>International Conference on Omni-layer Intelligent Systems >A Microcontroller is All You Need: Enabling Transformer Execution on Low-Power IoT Endnodes
【24h】

A Microcontroller is All You Need: Enabling Transformer Execution on Low-Power IoT Endnodes

机译:您需要的微控制器:在低功耗IOT Endnodes上启用变压器执行

获取原文

摘要

Transformer networks have become state-of-the-art for many tasks such as NLP and are closing the gap on other tasks like image recognition. Similarly, Transformers and Attention methods are starting to attract attention on smaller-scale tasks, which fit the typical memory envelope of MCUs. In this work, we propose a new set of execution kernels tuned for efficient execution on MCU-class RISC-V and ARM Cortex-M cores. We focus on minimizing memory movements while maximizing data reuse in the Attention layers. With our library, we obtain 3.4×, 1.8×, and 2.1× lower latency and energy on 8-bit Attention layers, compared to previous state-of-the-art (SoA) linear and matrix multiplication kernels in the CMSIS-NN and PULP-NN libraries on the STM32H7 (Cortex M7), STM32L4 (Cortex M4), and GAP8 (RISC-V IMC-Xpulp) platforms, respectively. As a use case for our TinyTransformer library, we also demonstrate that we can fit a 263 kB Transformer on the GAP8 platform, outperforming the previous SoA convolutional architecture on the TinyRadarNN dataset, with a latency of 9.24 ms and 0.47 mJ energy consumption and an accuracy improvement of 3.5%.
机译:对于许多任务(如NLP)而言,变压器网络已经成为最先进的,并且正在关闭图像识别等其他任务的间隙。同样,变压器和注意方法开始对较小规模的任务引起注意,这适合MCU的典型内存包络。在这项工作中,我们提出了一组新的执行内核,可在MCU-Class RISC-V和ARM Cortex-M内核上进行高效执行。我们专注于最小化内存移动,同时最大化注意力层中的数据重用。与我们的图书馆一起获得3.4倍,1.8倍和2.1倍的延迟和能量,而在CMSI-NN中以前的最先进(SOA)线性和矩阵乘法内核相比STM32H7(Cortex M7),STM32L4(Cortex M4)和GAP8(RISC-V IMC-XPULP)平台上的纸浆-NN库。作为我们的TinyTransformer库的用例,我们还证明我们可以在GAP8平台上安装263 KB变压器,优于Tinyradarnn数据集上的先前SOA卷积架构,延迟为9.24 ms和0.47 MJ能耗和精度。提高3.5%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号