...
首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Multi-FPGA Accelerator for Scalable Stencil Computation with Constant Memory Bandwidth
【24h】

Multi-FPGA Accelerator for Scalable Stencil Computation with Constant Memory Bandwidth

机译:具有恒定存储器带宽的可扩展模板计算的多FPGA加速器

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Stencil computation is one of the important kernels in scientific computations. However, sustained performance is limited owing to restriction on memory bandwidth, especially on multicore microprocessors and graphics processing units (GPUs) because of their small operational intensity. In this paper, we present a custom computing machine (CCM), called a scalable streaming-array (SSA), for high-performance stencil computations with multiple field-programmable gate arrays (FPGAs). We design SSA based on a domain-specific programmable concept, where CCMs are programmable with the minimum functionality required for an algorithm domain. We employ a deep pipelining approach over successive iterations to achieve linear scalability for multiple devices with a constant memory bandwidth. Prototype implementation using nine FPGAs demonstrates good agreement with a performance model, and achieves 260 and 236 GFlop/s for 2D and 3D Jacobi computation, which are 87.4 and 83.9 percent of the peak, respectively, with a memory bandwidth of only 2.0 GB/s. We also evaluate the performance of SSA for state-of-the-art FPGAs.
机译:模板计算是科学计算中的重要内核之一。但是,由于内存带宽的限制,尤其是多核微处理器和图形处理单元(GPU)的运行强度较低,因此持续性能受到限制。在本文中,我们提出了一种称为可伸缩流阵列(SSA)的自定义计算机(CCM),用于使用多个现场可编程门阵列(FPGA)进行高性能的模板计算。我们基于特定领域的可编程概念设计SSA,其中CCM可以使用算法域所需的最小功能进行编程。我们在连续迭代中采用深度流水线方法,以实现具有恒定内存带宽的多个设备的线性可扩展性。使用9个FPGA的原型实现证明与性能模型具有良好的一致性,并且2D和3D Jacobi计算分别达到260和236 GFlop / s,分别是峰值的87.4和83.9%,而存储器带宽仅为2.0 GB / s 。我们还评估了最新FPGA的SSA的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号