首页> 外文会议>International conference on Compilers, architecture and synthesis for embedded systems >Compiler-decided dynamic memory allocation for scratch-pad based embedded systems
【24h】

Compiler-decided dynamic memory allocation for scratch-pad based embedded systems

机译:基于暂存器的嵌入式系统的编译器决定的动态内存分配

获取原文

摘要

is a fast compiler-managed SRAM memory that replaces the hardware-managed cache. It is motivated by its better real-time guarantees vs cache and by its significantly lower overheads in energy consumption, area and overall runtime, even with a simple allocation scheme [4].Existing scratch-pad allocation methods are of two types. First, software-caching schemes emulate the workings of a hardware cache in software. Instructions are inserted before each load/store to check the software-maintained cache tags. Such methods incur large overheads in runtime, code size, energy consumption and SRAM space for tags and deliver poor real-time guarantees just like hardware caches. A second category of algorithms partitionsm variables at compile-time into the two banks. For example, our previous work in [3] derives a provably optimal static allocation for global and stack variables and achieves a speedup over all earlier methods. However, a drawback of such static allocation schemes is that they do not account for dynamic program behavior. It is easy to see why a data allocation that never changes at runtime cannot achieve the full locality benefits of a cache.In this paper we present a dynamic allocation method for global and stack data that for the first time, (i) accounts for changing program requirements at runtime (ii) has no software-caching tags (iii) requires no run-time checks (iv) has extremely low overheads, and (v) yields 100% predictable memory access times. In this method data that is about to be accessed frequently is copied into the SRAM using compiler-inserted code at fixed and infrequent points in the program. Earlier data is evicted if necessary. When compared to a provably optimal static allocation our results show runtime reductions ranging from 11% to 38%, averaging 31.2%, using no additional hardware support. With hardware support for pseudo-DMA and full DMA, which is already provided in some commercial systems, the runtime reductions increase to 33.4% and 34.2% respectively.
机译:是由编译器管理的快速SRAM存储器,它代替了硬件管理的缓存。它的动力来自更好的实时保证(相对于缓存),以及能源消耗,区域和总体运行时间的显着降低的开销,即使使用简单的分配方案[4]。现有的暂存器分配方法有两种。首先,软件缓存方案模拟了软件中硬件缓存的工作方式。在每次加载/存储之前插入说明,以检查软件维护的缓存标签。这样的方法会在运行时,代码大小,能耗和标签的SRAM空间方面产生大量开销,并且像硬件缓存一样,提供的实时保证也很差。第二类算法在编译时将变量划分为两个库。例如,我们在[3]中的先前工作为全局变量和堆栈变量得出了可证明的最佳静态分配,并实现了所有早期方法的加速。但是,这种静态分配方案的缺点是它们不能解决动态程序行为。很容易看出为什么在运行时从未更改过的数据分配无法获得缓存的全部本地优势。在本文中,我们提出了一种针对全局数据和堆栈数据的动态分配方法,该方法首次(i)负责更改运行时对程序的要求(ii)没有软件缓存标签(iii)不需要运行时检查(iv)开销极低,并且(v)产生100%可预测的内存访问时间。在这种方法中,将要频繁访问的数据在程序的固定点和不频繁点使用编译器插入的代码复制到SRAM中。如有必要,可以驱逐较早的数据。与可证明的最佳静态分配相比,我们的结果表明,在不使用其他硬件支持的情况下,运行时间减少了11%至38%,平均为31.2%。借助一些商业系统中已经提供的对伪DMA和全DMA的硬件支持,运行时间的减少分别增加到33.4%和34.2%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号