Custom-sized caches in application-specific memory hierarchies

机译：应用程序特定的内存层次结构中的自定义大小的缓存

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Developing FPGA implementations with an input specification in a high-level programming language such as C/C++ or OpenCL allows for a substantially shortened design cycle compared to a design entry at register transfer level. This work targets high-level synthesis (HLS) implementations that process large amounts of data and therefore require access to an off-chip memory. We leverage the customizability of the FPGA on-chip memory to automatically construct a multi-cache architecture in order to enhance the performance of the interface between parallel functional units of the HLS core and an external memory. Our focus is on automatic cache sizing. Firstly, our technique determines and uses up unused left-over block RAM resources for the construction of on-chip caches. Secondly, we devise a high-level cache performance estimation based on the memory access trace of the program. We use this memory trace to find a heterogeneous configuration of cache sizes, tailored to the application's memory access characteristic, that maximizes the performance of the multi-cache system subject to an on-chip memory resource constraint. We evaluate our technique with three benchmark implementations on an FPGA board and obtain a reduction in execution latency of up to 2× (1.5× on average) when compared to a one-size-fits-all cache sizing. We also quantify the impact of our automatically generated cache system on the overall energy consumption of the implementation.

机译：与使用诸如C / C ++或OpenCL的高级编程语言的输入规范来开发FPGA实施方案相比，在寄存器传输级的设计条目可以大大缩短设计周期。这项工作的目标是处理大量数据的高级综合（HLS）实施，因此需要访问片外存储器。我们利用FPGA片上存储器的可定制性来自动构建多高速缓存架构，以增强HLS内核的并行功能单元和外部存储器之间接口的性能。我们的重点是自动缓存大小调整。首先，我们的技术确定并使用了未使用的剩余Block RAM资源来构建片上缓存。其次，我们基于程序的内存访问轨迹设计了高级缓存性能估计。我们使用此内存跟踪来查找针对应用程序的内存访问特性量身定制的缓存大小的异构配置，该配置可在片上内存资源受限的情况下最大化多缓存系统的性能。我们使用FPGA板上的三种基准实现对我们的技术进行了评估，与“一刀切”的高速缓存大小相比，执行延迟降低了2倍（平均1.5倍）。我们还量化了自动生成的缓存系统对实施的总体能耗的影响。

著录项

来源
《2015 International Conference on Field Programmable Technology》|2015年|144-151|共8页
会议地点 Queenstown(SG)
作者
Felix Winterstein; Kermin Fleming; Hsin-Jung Yang; John Wickerson; George Constantinides;
展开▼
作者单位

Dept. of Electr. Electron. Eng., Imperial Coll. London, London, UK;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Intel Turbo Memory: Nonvolatile Disk Caches in the Storage Hierarchy of Mainstream Computer Systems [J] . JEANNA MATTHEWS, SANJEEV TRIKA, DEBRA HENSGEN, ACM Transactions on Storage . 2008,第2期

机译：英特尔睿频内存：主流计算机系统的存储层次结构中的非易失性磁盘缓存
2. Increasing energy efficiency of embedded systems by application-specific memory hierarchy generation [J] . Benini L., Macii A. IEEE Design & Test of Computers Magazine . 2000,第2期

机译：通过专用内存层次生成，提高嵌入式系统的能效
3. Rethinking a Heap Hierarchy as a Cache Hierarchy: A Higher-Order Theory of Memory Demand (HOTM) [J] . Pengcheng Li, Hao Luo, Chen Ding ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 2016,第11期

机译：重新思考堆层次结构作为缓存层次结构：高阶记忆需求理论（hotm）
4. Custom-sized caches in application-specific memory hierarchies [C] . Felix Winterstein, Kermin Fleming, Hsin-Jung Yang, International Conference on Field Programmable Technology . 2015

机译：特定于应用程序的内存层次结构中的自定义缓存
5. Improving memory hierarchy performance with hardware prefetching and cache replacement. [D] . Lin, Wei-Fen. 2002

机译：通过硬件预取和缓存替换来提高内存层次结构的性能。
6. Optimal Design of Hierarchical Cloud-FogEdge Computing Networks with Caching [O] . Xiaoqian Fan, Haina Zheng, Ruihong Jiang, 2020

机译：高速缓存等级云雾和边缘计算网络的最佳设计
7. Custom-sized caches in application-specific memory hierarchies [O] . Felix Winterstein, Kermin Fleming, Hsin-Jung Yang, 2015

机译：特定于应用程序的内存层次结构中的自定义高速缓存

Custom-sized caches in application-specific memory hierarchies

摘要

著录项

相似文献

相关主题

期刊订阅