首页> 外文期刊>Concurrency and computation: practice and experience >Space-address decoupled scratchpad memory management for neural network accelerators
【24h】

Space-address decoupled scratchpad memory management for neural network accelerators

机译:用于神经网络加速器的空间地址解耦暂存存储器管理

获取原文
获取原文并翻译 | 示例

摘要

Deep neural networks have been demonstrated to be useful in varieties of intelligent tasks, and various specialized NN accelerators have been proposed recently to improve the hardware efficiency, which are typically equipped with software-managed scratchpad memory (SPM) for high performance and energy efficiency. However, traditional SPM management techniques cause memory fragmentation for NN accelerators, and thus lead to low utilization of precious SPM. The main reason is that traditional techniques are originally designed for managingfixed-length registersrather thanvariable-length memory blocks. In this article, we propose a novel SPM management approach for NN accelerators. The basic intuition is that NN computation/memory behaviors are predictable and relatively regular compared with traditional applications, and thus most information can be determined at compile time. In addition, by exploiting the variable-length feature of SPM, we propose to divide the allocation process into two passes: thespace assignmentand theaddress assignmentpass, which are simultaneously (and implicitly) performed in traditional one-pass allocation techniques. Experimental results on the memory requests of a representative NN accelerator demonstrate that the proposed approach can significantly reduce the memory consumption by 30% at most compared with state-of-the-art SPM management techniques, and the memory usage is only 2% larger than that of the theoretical optimal allocation.
机译:深度神经网络已被证明在智能任务的各种品种中有用,并且最近已经提出了各种专业的NN加速器来提高硬件效率,这些硬件效率通常配备有用于高性能和能效的软件管理的ScratchPad内存(SPM)。然而,传统的SPM管理技术导致NN加速器的内存碎片,从而导致低利用珍贵的SPM。主要原因是传统技术最初设计用于管理校正长度注册器攻击性长度存储块。在本文中,我们为NN加速器提出了一种新的SPM管理方法。基本直觉是与传统应用相比,NN计算/存储器行为是可预测的,并且比较相对规则,因此可以在编译时确定大多数信息。另外,通过利用SPM的可变长度特征,我们建议将分配过程分为两个通行证:关闭分配和theaddress分配通道,它在传统的单通分配技术中同时(和隐式)。代表性NN加速器的存储器请求的实验结果表明,与最先进的SPM管理技术相比,所提出的方法可以显着降低30%的内存消耗,并且内存使用量仅为2%理论最优分配的。

著录项

  • 来源
    《Concurrency and computation: practice and experience》 |2021年第6期|e6046.1-e6046.13|共13页
  • 作者单位

    Univ Sci & Technol China Sch Comp Sci & Technol Hefei Peoples R China|Nanjing Univ Aeronaut & Astronaut Nanjing Peoples R China|Chinese Acad Sci Inst Comp Technol SKL Comp Architecture Beijing Peoples R China;

    Cambricon Technol Beijing Peoples R China;

    Chinese Acad Sci Inst Comp Technol SKL Comp Architecture Beijing Peoples R China;

    Nanjing Univ Aeronaut & Astronaut Nanjing Peoples R China;

    Nanjing Univ Aeronaut & Astronaut Nanjing Peoples R China;

    Nanjing Univ Aeronaut & Astronaut Nanjing Peoples R China|Univ Chinese Acad Sci Sch Comp Sci & Technol Beijing Peoples R China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    deep neural network; memory management; scratchpad memory;

    机译:深度神经网络;内存管理;临时记忆;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号