首页> 外文会议>2015 International Conference on Field Programmable Technology >Custom-sized caches in application-specific memory hierarchies
【24h】

Custom-sized caches in application-specific memory hierarchies

机译:应用程序特定的内存层次结构中的自定义大小的缓存

获取原文
获取原文并翻译 | 示例

摘要

Developing FPGA implementations with an input specification in a high-level programming language such as C/C++ or OpenCL allows for a substantially shortened design cycle compared to a design entry at register transfer level. This work targets high-level synthesis (HLS) implementations that process large amounts of data and therefore require access to an off-chip memory. We leverage the customizability of the FPGA on-chip memory to automatically construct a multi-cache architecture in order to enhance the performance of the interface between parallel functional units of the HLS core and an external memory. Our focus is on automatic cache sizing. Firstly, our technique determines and uses up unused left-over block RAM resources for the construction of on-chip caches. Secondly, we devise a high-level cache performance estimation based on the memory access trace of the program. We use this memory trace to find a heterogeneous configuration of cache sizes, tailored to the application's memory access characteristic, that maximizes the performance of the multi-cache system subject to an on-chip memory resource constraint. We evaluate our technique with three benchmark implementations on an FPGA board and obtain a reduction in execution latency of up to 2× (1.5× on average) when compared to a one-size-fits-all cache sizing. We also quantify the impact of our automatically generated cache system on the overall energy consumption of the implementation.
机译:与使用诸如C / C ++或OpenCL的高级编程语言的输入规范来开发FPGA实施方案相比,在寄存器传输级的设计条目可以大大缩短设计周期。这项工作的目标是处理大量数据的高级综合(HLS)实施,因此需要访问片外存储器。我们利用FPGA片上存储器的可定制性来自动构建多高速缓存架构,以增强HLS内核的并行功能单元和外部存储器之间接口的性能。我们的重点是自动缓存大小调整。首先,我们的技术确定并使用了未使用的剩余Block RAM资源来构建片上缓存。其次,我们基于程序的内存访问轨迹设计了高级缓存性能估计。我们使用此内存跟踪来查找针对应用程序的内存访问特性量身定制的缓存大小的异构配置,该配置可在片上内存资源受限的情况下最大化多缓存系统的性能。我们使用FPGA板上的三种基准实现对我们的技术进行了评估,与“一刀切”的高速缓存大小相比,执行延迟降低了2倍(平均1.5倍)。我们还量化了自动生成的缓存系统对实施的总体能耗的影响。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号