首页> 外文会议>Field-Programmable Custom Computing Machines (FCCM), 2012 IEEE 20th Annual International Symposium on >Impact of Cache Architecture and Interface on Performance and Area of FPGA-Based Processor/Parallel-Accelerator Systems
【24h】

Impact of Cache Architecture and Interface on Performance and Area of FPGA-Based Processor/Parallel-Accelerator Systems

机译:缓存体系结构和接口对基于FPGA的处理器/并行加速器系统的性能和面积的影响

获取原文
获取原文并翻译 | 示例

摘要

We describe new multi-ported cache designs suitable for use in FPGA-based processor/parallel-accelerator systems, and evaluate their impact on application performance and area. The baseline system comprises a MIPS soft processor and custom hardware accelerators with a shared memory architecture: on-FPGA L1 cache backed by off-chip DDR2 SDRAM. Within this general system model, we evaluate traditional cache design parameters (cache size, line size, associativity). In the parallel accelerator context, we examine the impact of the cache design and its interface. Specifically, we look at how the number of cache ports affects performance when multiple hardware accelerators operate (and access memory) in parallel, and evaluate two different hardware implementations of multi-ported caches using: 1) multi-pumping, and 2) a recently-published approach based on the concept of a live-value table. Results show that application performance depends strongly on the cache interface and architecture: for a system with 6 accelerators, depending on the cache design, speed up swings from 0.73× to 6.14×, on average, relative to a baseline sequential system (with a single accelerator and a direct-mapped, 2KB cache with 32B lines). Considering both performance and area, the best architecture is found to be a 4-port multi-pump direct-mapped cache with a 16KB cache size and a 128B line size.
机译:我们描述了适用于基于FPGA的处理器/并行加速器系统的新型多端口缓存设计,并评估了它们对应用程序性能和面积的影响。基准系统包括MIPS软处理器和具有共享内存体系结构的定制硬件加速器:由片外DDR2 SDRAM支持的FPGA上L1缓存。在这个通用系统模型中,我们评估传统的缓存设计参数(缓存大小,行大小,关联性)。在并行加速器上下文中,我们检查了缓存设计及其接口的影响。具体来说,我们研究了当多个硬件加速器并行运行(和访问内存)时缓存端口的数量如何影响性能,并使用以下方法评估多端口缓存的两种不同的硬件实现:1)多次泵送,和2)最近使用-根据实值表的概念发布的方法。结果表明,应用程序性能在很大程度上取决于缓存接口和体系结构:对于具有6个加速器的系统,具体取决于缓存设计,相对于基线顺序系统(只有一个),平均速度从0.73倍提高到6.14倍。加速器和具有32B行的直接映射2KB缓存)。考虑到性能和面积,最好的体系结构是4端口多泵直接映射缓存,缓存大小为16KB,行大小为128B。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号